Left Join Killing Me

  • Hi,

    I have a query where the last left join is killing the performance. Someone suggested me to replace the last corelated subquery with a CTE but being a DBA and limited skills in T-SQL, I request help in this forum.

    Here is the query:

    CREATE VIEW [dbo].[vw_ScheduledReceipts]

    AS

    SELECT

    term.StateCode AS OriginState

    ,dest_ad .[State] AS DestinationState

    ,ImpExp = CASE

    WHEN term.StateCode=ord.DestState THEN 0

    ELSE 1

    END

    ,Carrier# = CASE

    WHEN ord.CarrierID >= 2000 THEN ord.CarrierID

    ELSE 999

    END

    ,car.CarrierCode AS CarrierName

    ,car.Descr AS CarrierDesc

    ,vend2.FederalID AS CarrierFederalID

    ,term.Descr AS Origin

    ,LTRIM(RTRIM(dest_ad.[City]))+ ', ' + LTRIM(RTRIM(dest_ad.[state])) AS Destination

    ,sup.SupplierID AS Supplier#

    ,sup.Descr AS SupplierDesc

    ,vend.FederalID AS SupplierFederalID

    ,SupplierName = CASE

    WHEN vend.Descr IS NULL THEN sup.Descr

    ELSE vend.Descr

    END

    ,CAST(CONVERT(VARCHAR,ord.PostedDate,101) AS SMALLDATETIME) AS PostedDate

    ,MONTH(ord.PostedDate) AS [Month]

    ,DATEPART(Q,ord.PostedDate) AS [Qtr]

    ,YEAR(ord.PostedDate) AS [Year]

    ,ordproditm.BOL AS BOL#

    ,prod.ProdCategory AS ProductCategory

    ,[Type] = CASE

    WHEN prod.ProdCategory = 'X' THEN prod.ProdCategory + '-' + prod.ProdCode

    WHEN prod.ProdCategory = 'K' AND prod.ProdCode = '11' AND ord.DestState <> 'OH' THEN 'D'

    WHEN prod.ProdCategory = 'K' AND prod.ProdCode = '11' AND ord.DestState= 'OH' THEN 'K'

    WHEN prod.ProdCategory = 'D' AND prod.ProdCode = '87' THEN 'X-97'

    ELSE prod.ProdCategory

    END

    ,prod.ProdCode AS ProductCode

    ,prod.Descr AS ProductDesc

    ,shpprod.TaxExemptReason AS CustomerType

    ,supterm.PayFlag AS PayFlag

    ,SUM(CAST(ordproditm.GrossGallons AS DECIMAL(17,6))) AS SumOfGrossGallons

    ,SUM(CAST(ordproditm.NetGallons AS DECIMAL(17,6))) AS SumOfNetGallons

    ,SumOfBilledGallons = CASE

    WHEN ordprod.BillCode = 'N' THEN SUM(CAST(ordproditm.NetGallons AS DECIMAL(17,6)))

    WHEN ordprod.BillCode = 'G' THEN SUM(CAST(ordproditm.GrossGallons AS DECIMAL(17,6)))

    ELSE 0

    END

    ,ordprod.BillCode AS BillCode

    ,cust.CustNum AS Customer#

    ,cust.CustName AS CustomerName

    ,term.TermCode AS TerminalCode

    ,dest_ad.County AS County

    ,cust.FEIN AS [CustomerFederalIdentification#]

    ,TerminalControl# = CASE

    WHEN ISNULL(tcn.TerminalNumber,'')='' THEN ''

    ELSE tcn.TerminalNumber

    END

    ,shpprod.StateBillByGN AS StateTaxGrossorNet

    , BillFreight = CASE

    WHEN CAST(shpprod.BillFreight AS INT)=1 THEN 'Y'

    ELSE 'N'

    END

    ,ord.DeliveryTicketNo AS DeliveryTicket#

    ,co.oaid as COA_CompanyID

    ,co.Name as CompanyName

    ,co.Descr as CompanyDescription

    ,co.Companyid as CompanyID

    FROM Orders ord

    INNER JOIN MOC_Replication.dbo.OrderProduct ordprod

    ON ordprod.OrderID = ord.OrderID

    INNER JOIN MOC_Replication.dbo.ShipToProduct shpprod

    ON shpprod.ShipToProdID = ordprod.ShipToProdID

    INNER JOIN MOC_Replication.dbo.ShipTo shpto

    ON shpto.ShipToID = ShpProd.ShipToID

    INNER JOIN MOC_Replication.dbo.Carrier car

    ON car.CarrierID = ord.CarrierID

    INNER JOIN MOC_Replication.dbo.OrderProdItem ordproditm

    ON ordproditm.OrderProdID = ordprod.OrderProdID

    INNER JOIN MOC_Replication.dbo.ShipToItem shpitm

    ON shpitm.ShipToProdID=ShpProd.ShipToProdID

    AND ordproditm.ShipToItemID = shpitm.ShipToItemID

    INNER JOIN MOC_Replication.dbo.Product prod

    ON prod.ProductID = shpitm.ProductID

    INNER JOIN MOC_Replication.dbo.Terminals term

    ON term.TerminalID = ordproditm.TerminalID

    INNER JOIN MOC_Replication.dbo.Customer cust

    ON cust.CustomerID = shpto.CustomerID

    INNER JOIN MOC_Replication.dbo.Company co

    ON co.Companyid = ord.CompanyID

    LEFT JOIN MOC_Replication.dbo.[Address] dest_ad

    ON dest_ad.addressid = shpto.addressid

    INNER JOIN MOC_Replication.dbo.Supplier sup

    ON sup.SupplierID = ordproditm.SupplierID

    LEFT JOIN MOC_Replication.dbo.SupplierTerminal supterm

    ON supterm.SupplierID=ordproditm.SupplierID

    AND supterm.TerminalID = ordproditm.TerminalID

    INNER JOIN MOC_Replication.dbo.Vendor vend

    ON vend.VendorID = sup.VendorID

    INNER JOIN MOC_Replication.dbo.vendor vend2

    ON vend2.VendorID = car.VendorID

    LEFT OUTER JOIN MOC_DT.dbo.TCN_Xref tcn

    ON tcn.recid = (SELECT MIN(recid) FROM MOC_DT.dbo.TCN_Xref

    WHERE DsLocationCode = CAST(term.TerminalID AS VARCHAR)

    AND (sup.SupplierCode LIKE DsSupcode + '-%')

    )

    WHERE (sup.SupplierID NOT BETWEEN 9993 AND 9999)

    AND ord.OrderStatus = 'P'

    GROUP BY

    term.StateCode

    ,dest_ad .[State]

    ,ord.DestState

    ,ord.CarrierID

    ,car.CarrierCode

    ,car.Descr

    ,vend2.FederalID

    ,term.Descr

    ,dest_ad.[City]

    ,dest_ad.[state]

    ,sup.SupplierID

    ,sup.Descr

    ,vend.FederalID

    ,vend.Descr

    ,ord.PostedDate

    ,ordproditm.BOL

    ,prod.ProdCategory

    ,prod.ProdCode

    ,prod.Descr

    ,shpprod.TaxExemptReason

    ,supterm.PayFlag

    ,ordprod.BillCode

    ,cust.CustNum

    ,cust.CustName

    ,term.TermCode

    ,dest_ad.county

    ,cust.FEIN

    ,tcn.TerminalNumber

    ,shpprod.StateBillByGN

    ,shpprod.BillFreight

    ,ord.DeliveryTicketNo

    ,co.oaid

    ,co.Name

    ,co.Descr

    ,co.Companyid

    Kindly let me know if it is possible to get rid of the last subquery. Forgive me if my question is incomplete.

    Thanks in advance!

    Chandan

  • You've been around long enough that you should know that we're going to ask for DDL, sample data, and the actual execution plan.

    I think that part of the problem is that you're joining a bunch of tables and then grouping on almost EVERYTHING. It looks like the only items being aggregated are in the OrderProdItem table, so you could potentially aggregate that table before joining to the other tables.

    Also, look at replacing your last LEFT OUTER JOIN with an OUTER APPLY using TOP (1).

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Anoter thing to try while you gather your DDL and execution plan is to move your where condition into the join condition for that table.

    Jared
    CE - Microsoft

  • drew.allen (5/10/2012)


    You've been around long enough that you should know that we're going to ask for DDL, sample data, and the actual execution plan.

    I think that part of the problem is that you're joining a bunch of tables and then grouping on almost EVERYTHING. It looks like the only items being aggregated are in the OrderProdItem table, so you could potentially aggregate that table before joining to the other tables.

    Also, look at replacing your last LEFT OUTER JOIN with an OUTER APPLY using TOP (1).

    Drew

    I apologize but I did not think it was required. I am attaching the DDL scripts and execution plan. Following is the actual query fired:

    SELECT "Tbl1002"."OriginState" "Col1006","Tbl1002"."DestinationState" "Col1007",

    "Tbl1002"."CarrierDesc" "Col1011","Tbl1002"."Origin" "Col1013",

    "Tbl1002"."Destination" "Col1014","Tbl1002"."SupplierFederalID" "Col1017",

    "Tbl1002"."SupplierName" "Col1018","Tbl1002"."PostedDate" "Col1019",

    "Tbl1002"."BOL#" "Col1023","Tbl1002"."Type" "Col1025",

    "Tbl1002"."SumOfGrossGallons" "Col1030","Tbl1002"."SumOfNetGallons" "Col1031",

    "Tbl1002"."BillCode" "Col1033","Tbl1002"."CustomerName" "Col1035"

    FROM "MOC_REPLICATION"."dbo"."vw_ScheduledReceipts" "Tbl1002"

    WHERE ("Tbl1002"."OriginState"='NY' OR "Tbl1002"."DestinationState"='NY')

    AND "Tbl1002"."Month"=(4) AND "Tbl1002"."Year"=(2012)

    AND "Tbl1002"."CompanyID"=(1)

    Thanks

    Chandan

  • I still don't see any sample data. It's hard to rewrite your view if we don't have data to compare the original results with the revised results.

    Also, please pick one method for aliasing columns and stick with it. Your view has both <expression> AS <alias> and <alias> = <expression>. It makes it much harder to read when you're inconsistent.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (5/11/2012)


    I still don't see any sample data. It's hard to rewrite your view if we don't have data to compare the original results with the revised results.

    Also, please pick one method for aliasing columns and stick with it. Your view has both <expression> AS <alias> and <alias> = <expression>. It makes it much harder to read when you're inconsistent.

    Drew

    I am afraid that if I put some sample data, I may violate conditions of my client. Thanks for your help and time. Somebody suggested me that the left join can be calculated as a CTE in the begining and that CTE can replace the subquery. but being a DBa, I have limited skills in T-SQL(still learning).

    Do you have any advice to offer here.

    Regards

    Chandan

  • chandan_jha18 (5/14/2012)


    drew.allen (5/11/2012)


    I still don't see any sample data. It's hard to rewrite your view if we don't have data to compare the original results with the revised results.

    Also, please pick one method for aliasing columns and stick with it. Your view has both <expression> AS <alias> and <alias> = <expression>. It makes it much harder to read when you're inconsistent.

    Drew

    I am afraid that if I put some sample data, I may violate conditions of my client. Thanks for your help and time. Somebody suggested me that the left join can be calculated as a CTE in the begining and that CTE can replace the subquery. but being a DBa, I have limited skills in T-SQL(still learning).

    Do you have any advice to offer here.

    Regards

    Chandan

    My advice is to provide sample data. Sample meaning that it does not contain real data. You cannot possibly violate any confidentiality agreement by making up some simple data.

    Jared
    CE - Microsoft

  • You've got multiple full scans running on millions of rows. Until you make the query filter that data, performance is going to stink and there's almost nothing you can do about it. You've also got some fundamental problems in the code. For example:

    CAST(term.TerminalID AS VARCHAR)

    Will lead to scans on your index. You can't have functions on columns in WHERE or JOIN criteria. And, if the two data types are different, why are they being used as JOIN criteria? That's not a T-SQL error. That's a design error. I'd fix it at the database structure level first, then worry about the T-SQL. After that, I'm with everyone else, why are you aggregating based on 20 columns? Are you doing that because that is the actual grouping criteria? Or, are you doing that because SQL Server forced you to add those columns to the GROUP BY statement since no aggregations were performed on them? If it's the latter, what is the real aggregate criteria? Make that into a derived table (doesn't need to be a CTE. CTEs are not magic. It can just be a sub-select similar to what you've done with your OUTER JOIN) then join that to the rest of the query.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • i think this should give the same results and be quicker

    but without data i can't prove it

    LEFT OUTER JOIN

    (SELECT DsSupcode,DsLocationCode,TerminalNumber,MIN(recid) as recidmin FROM MOC_DT.dbo.TCN_Xref

    WHERE

    AND (group by TerminalNumber,DsLocationCode,DsSupcode )

    ) tcn ON tcn.DsLocationCode = CAST(term.TerminalID AS VARCHAR) and sup.SupplierCode LIKE tcn.DsSupcode + '-%'

    it's also much cleaner code

    MVDBA

  • Grant Fritchey (5/14/2012)


    You've got multiple full scans running on millions of rows. Until you make the query filter that data, performance is going to stink and there's almost nothing you can do about it. You've also got some fundamental problems in the code. For example:

    CAST(term.TerminalID AS VARCHAR)

    Will lead to scans on your index. You can't have functions on columns in WHERE or JOIN criteria. And, if the two data types are different, why are they being used as JOIN criteria? That's not a T-SQL error. That's a design error. I'd fix it at the database structure level first, then worry about the T-SQL. After that, I'm with everyone else, why are you aggregating based on 20 columns? Are you doing that because that is the actual grouping criteria? Or, are you doing that because SQL Server forced you to add those columns to the GROUP BY statement since no aggregations were performed on them? If it's the latter, what is the real aggregate criteria? Make that into a derived table (doesn't need to be a CTE. CTEs are not magic. It can just be a sub-select similar to what you've done with your OUTER JOIN) then join that to the rest of the query.

    I agree with you Sir. Design made years ago haunt us now and we keep running here and there to keep performance right. If the developers had so much information over correct database designs years ago,like this is today, we DBAs can concentrate on T-SQL too but as you said T-SQL cannot do magic if we are forcing it to compare apples and oranges. It will give us the output because we paid for it but does not guarantee performance.

    Thanks

    Chandan

  • michael vessey (5/14/2012)


    i think this should give the same results and be quicker

    but without data i can't prove it

    LEFT OUTER JOIN

    (SELECT DsSupcode,DsLocationCode,TerminalNumber,MIN(recid) as recidmin FROM MOC_DT.dbo.TCN_Xref

    WHERE

    AND (group by TerminalNumber,DsLocationCode,DsSupcode )

    ) tcn ON tcn.DsLocationCode = CAST(term.TerminalID AS VARCHAR) and sup.SupplierCode LIKE tcn.DsSupcode + '-%'

    it's also much cleaner code

    thanks for taking your valuable time to look at it. I can verify the results but wanted to ask if you missed a condition in your where clause before 'AND' operator.

    Thanks

    Chandan

  • this is an old thread but would like to inform that I could resolve this issue by changing the view to a procedure. Thanks to everyone who pitched in with their ideas!

    Regards

    Chandan

  • The left join on the MIN is going to suck in any case, but these are the real killers here I bet:

    WHERE DsLocationCode = CAST(term.TerminalID AS VARCHAR)

    AND (sup.SupplierCode LIKE DsSupcode + '-%')

    the first a) voids the use of an index seek if one is appropriate and available and b) prevents the optimizer from getting good estmiates

    the second will also crush the optimizer's ability to get good estimates

    Speaking of estimates, there are SOOO many joins here the slightest skew can REALLY mess things up. The OR in the actual view call likewise.

    Looks like a Business Objects query, or from some other automated reporting tool. Yucko!! 🙂

    Lastly, one could spend tens of hours on something this complex to get it working well. I feel that this goes WAY beyond what one should expect from a free forum. Get yourself a professional tuner to help you out. That would avoid the issue of not being able to show real data too (which will ABSOLUTELY be required to get this one improved). This beast simply cannot be tuned with a few generated rows (which would be EXTRAORDINARILY difficult in any case given the number of tables involved).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (5/24/2012)


    The left join on the MIN is going to suck in any case, but these are the real killers here I bet:

    WHERE DsLocationCode = CAST(term.TerminalID AS VARCHAR)

    AND (sup.SupplierCode LIKE DsSupcode + '-%')

    the first a) voids the use of an index seek if one is appropriate and available and b) prevents the optimizer from getting good estmiates

    the second will also crush the optimizer's ability to get good estimates

    Speaking of estimates, there are SOOO many joins here the slightest skew can REALLY mess things up. The OR in the actual view call likewise.

    Looks like a Business Objects query, or from some other automated reporting tool. Yucko!! 🙂

    Lastly, one could spend tens of hours on something this complex to get it working well. I feel that this goes WAY beyond what one should expect from a free forum. Get yourself a professional tuner to help you out. That would avoid the issue of not being able to show real data too (which will ABSOLUTELY be required to get this one improved). This beast simply cannot be tuned with a few generated rows (which would be EXTRAORDINARILY difficult in any case given the number of tables involved).

    I apologize for posting such a long question with complex queries. As far as index scans and cast, convert functions are concerned, even a professional person would have given same advice as you experts did here. Schema is something that gets very very hard to change after few years of its inception. I tried that but was not approved. So I keep looking for some sort of band-aid 🙂

    In this case, I noted that my query had a where condition and the view had a where condition built in too. Somehow when sum() function was operating, it did not consider the where clause of the query and ended up doing this operation 1 million times. Then the where filter of the query was applied to filter maximum rows and display only 4k rows. So i suggested them to have a procedure inside which all filters can be fit.

    Apparently, the 'where' condition inside view was getting considered first and the 'where' condition of the query was coming into picture at a much much later stage whereas I expected SQL engine to combine both of them.

    Do you know why it happened that way?

    Regards

    Chandan

  • chandan_jha18 (5/24/2012)


    TheSQLGuru (5/24/2012)


    The left join on the MIN is going to suck in any case, but these are the real killers here I bet:

    WHERE DsLocationCode = CAST(term.TerminalID AS VARCHAR)

    AND (sup.SupplierCode LIKE DsSupcode + '-%')

    the first a) voids the use of an index seek if one is appropriate and available and b) prevents the optimizer from getting good estmiates

    the second will also crush the optimizer's ability to get good estimates

    Speaking of estimates, there are SOOO many joins here the slightest skew can REALLY mess things up. The OR in the actual view call likewise.

    Looks like a Business Objects query, or from some other automated reporting tool. Yucko!! 🙂

    Lastly, one could spend tens of hours on something this complex to get it working well. I feel that this goes WAY beyond what one should expect from a free forum. Get yourself a professional tuner to help you out. That would avoid the issue of not being able to show real data too (which will ABSOLUTELY be required to get this one improved). This beast simply cannot be tuned with a few generated rows (which would be EXTRAORDINARILY difficult in any case given the number of tables involved).

    I apologize for posting such a long question with complex queries. As far as index scans and cast, convert functions are concerned, even a professional person would have given same advice as you experts did here. Schema is something that gets very very hard to change after few years of its inception. I tried that but was not approved. So I keep looking for some sort of band-aid 🙂

    In this case, I noted that my query had a where condition and the view had a where condition built in too. Somehow when sum() function was operating, it did not consider the where clause of the query and ended up doing this operation 1 million times. Then the where filter of the query was applied to filter maximum rows and display only 4k rows. So i suggested them to have a procedure inside which all filters can be fit.

    Apparently, the 'where' condition inside view was getting considered first and the 'where' condition of the query was coming into picture at a much much later stage whereas I expected SQL engine to combine both of them.

    Do you know why it happened that way?

    Regards

    Chandan

    Yes, I do. Overly complex query. The optimizer can only do so much, and although it is TRULY AMAZING what it is capable of you are asking a LOT of it. You need to help it out sometimes. Given 15 years of tuning experience you learn all kinds of ways to help it out when required. 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply