May 10, 2012 at 8:24 am
Hi,
I have a query where the last left join is killing the performance. Someone suggested me to replace the last corelated subquery with a CTE but being a DBA and limited skills in T-SQL, I request help in this forum.
Here is the query:
CREATE VIEW [dbo].[vw_ScheduledReceipts]
AS
SELECT
term.StateCode AS OriginState
,dest_ad .[State] AS DestinationState
,ImpExp = CASE
WHEN term.StateCode=ord.DestState THEN 0
ELSE 1
END
,Carrier# = CASE
WHEN ord.CarrierID >= 2000 THEN ord.CarrierID
ELSE 999
END
,car.CarrierCode AS CarrierName
,car.Descr AS CarrierDesc
,vend2.FederalID AS CarrierFederalID
,term.Descr AS Origin
,LTRIM(RTRIM(dest_ad.[City]))+ ', ' + LTRIM(RTRIM(dest_ad.[state])) AS Destination
,sup.SupplierID AS Supplier#
,sup.Descr AS SupplierDesc
,vend.FederalID AS SupplierFederalID
,SupplierName = CASE
WHEN vend.Descr IS NULL THEN sup.Descr
ELSE vend.Descr
END
,CAST(CONVERT(VARCHAR,ord.PostedDate,101) AS SMALLDATETIME) AS PostedDate
,MONTH(ord.PostedDate) AS [Month]
,DATEPART(Q,ord.PostedDate) AS [Qtr]
,YEAR(ord.PostedDate) AS [Year]
,ordproditm.BOL AS BOL#
,prod.ProdCategory AS ProductCategory
,[Type] = CASE
WHEN prod.ProdCategory = 'X' THEN prod.ProdCategory + '-' + prod.ProdCode
WHEN prod.ProdCategory = 'K' AND prod.ProdCode = '11' AND ord.DestState <> 'OH' THEN 'D'
WHEN prod.ProdCategory = 'K' AND prod.ProdCode = '11' AND ord.DestState= 'OH' THEN 'K'
WHEN prod.ProdCategory = 'D' AND prod.ProdCode = '87' THEN 'X-97'
ELSE prod.ProdCategory
END
,prod.ProdCode AS ProductCode
,prod.Descr AS ProductDesc
,shpprod.TaxExemptReason AS CustomerType
,supterm.PayFlag AS PayFlag
,SUM(CAST(ordproditm.GrossGallons AS DECIMAL(17,6))) AS SumOfGrossGallons
,SUM(CAST(ordproditm.NetGallons AS DECIMAL(17,6))) AS SumOfNetGallons
,SumOfBilledGallons = CASE
WHEN ordprod.BillCode = 'N' THEN SUM(CAST(ordproditm.NetGallons AS DECIMAL(17,6)))
WHEN ordprod.BillCode = 'G' THEN SUM(CAST(ordproditm.GrossGallons AS DECIMAL(17,6)))
ELSE 0
END
,ordprod.BillCode AS BillCode
,cust.CustNum AS Customer#
,cust.CustName AS CustomerName
,term.TermCode AS TerminalCode
,dest_ad.County AS County
,cust.FEIN AS [CustomerFederalIdentification#]
,TerminalControl# = CASE
WHEN ISNULL(tcn.TerminalNumber,'')='' THEN ''
ELSE tcn.TerminalNumber
END
,shpprod.StateBillByGN AS StateTaxGrossorNet
, BillFreight = CASE
WHEN CAST(shpprod.BillFreight AS INT)=1 THEN 'Y'
ELSE 'N'
END
,ord.DeliveryTicketNo AS DeliveryTicket#
,co.oaid as COA_CompanyID
,co.Name as CompanyName
,co.Descr as CompanyDescription
,co.Companyid as CompanyID
FROM Orders ord
INNER JOIN MOC_Replication.dbo.OrderProduct ordprod
ON ordprod.OrderID = ord.OrderID
INNER JOIN MOC_Replication.dbo.ShipToProduct shpprod
ON shpprod.ShipToProdID = ordprod.ShipToProdID
INNER JOIN MOC_Replication.dbo.ShipTo shpto
ON shpto.ShipToID = ShpProd.ShipToID
INNER JOIN MOC_Replication.dbo.Carrier car
ON car.CarrierID = ord.CarrierID
INNER JOIN MOC_Replication.dbo.OrderProdItem ordproditm
ON ordproditm.OrderProdID = ordprod.OrderProdID
INNER JOIN MOC_Replication.dbo.ShipToItem shpitm
ON shpitm.ShipToProdID=ShpProd.ShipToProdID
AND ordproditm.ShipToItemID = shpitm.ShipToItemID
INNER JOIN MOC_Replication.dbo.Product prod
ON prod.ProductID = shpitm.ProductID
INNER JOIN MOC_Replication.dbo.Terminals term
ON term.TerminalID = ordproditm.TerminalID
INNER JOIN MOC_Replication.dbo.Customer cust
ON cust.CustomerID = shpto.CustomerID
INNER JOIN MOC_Replication.dbo.Company co
ON co.Companyid = ord.CompanyID
LEFT JOIN MOC_Replication.dbo.[Address] dest_ad
ON dest_ad.addressid = shpto.addressid
INNER JOIN MOC_Replication.dbo.Supplier sup
ON sup.SupplierID = ordproditm.SupplierID
LEFT JOIN MOC_Replication.dbo.SupplierTerminal supterm
ON supterm.SupplierID=ordproditm.SupplierID
AND supterm.TerminalID = ordproditm.TerminalID
INNER JOIN MOC_Replication.dbo.Vendor vend
ON vend.VendorID = sup.VendorID
INNER JOIN MOC_Replication.dbo.vendor vend2
ON vend2.VendorID = car.VendorID
LEFT OUTER JOIN MOC_DT.dbo.TCN_Xref tcn
ON tcn.recid = (SELECT MIN(recid) FROM MOC_DT.dbo.TCN_Xref
WHERE DsLocationCode = CAST(term.TerminalID AS VARCHAR)
AND (sup.SupplierCode LIKE DsSupcode + '-%')
)
WHERE (sup.SupplierID NOT BETWEEN 9993 AND 9999)
AND ord.OrderStatus = 'P'
GROUP BY
term.StateCode
,dest_ad .[State]
,ord.DestState
,ord.CarrierID
,car.CarrierCode
,car.Descr
,vend2.FederalID
,term.Descr
,dest_ad.[City]
,dest_ad.[state]
,sup.SupplierID
,sup.Descr
,vend.FederalID
,vend.Descr
,ord.PostedDate
,ordproditm.BOL
,prod.ProdCategory
,prod.ProdCode
,prod.Descr
,shpprod.TaxExemptReason
,supterm.PayFlag
,ordprod.BillCode
,cust.CustNum
,cust.CustName
,term.TermCode
,dest_ad.county
,cust.FEIN
,tcn.TerminalNumber
,shpprod.StateBillByGN
,shpprod.BillFreight
,ord.DeliveryTicketNo
,co.oaid
,co.Name
,co.Descr
,co.Companyid
Kindly let me know if it is possible to get rid of the last subquery. Forgive me if my question is incomplete.
Thanks in advance!
Chandan
May 10, 2012 at 9:21 am
You've been around long enough that you should know that we're going to ask for DDL, sample data, and the actual execution plan.
I think that part of the problem is that you're joining a bunch of tables and then grouping on almost EVERYTHING. It looks like the only items being aggregated are in the OrderProdItem table, so you could potentially aggregate that table before joining to the other tables.
Also, look at replacing your last LEFT OUTER JOIN with an OUTER APPLY using TOP (1).
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
May 10, 2012 at 2:02 pm
Anoter thing to try while you gather your DDL and execution plan is to move your where condition into the join condition for that table.
Jared
CE - Microsoft
May 11, 2012 at 4:44 am
drew.allen (5/10/2012)
You've been around long enough that you should know that we're going to ask for DDL, sample data, and the actual execution plan.I think that part of the problem is that you're joining a bunch of tables and then grouping on almost EVERYTHING. It looks like the only items being aggregated are in the OrderProdItem table, so you could potentially aggregate that table before joining to the other tables.
Also, look at replacing your last LEFT OUTER JOIN with an OUTER APPLY using TOP (1).
Drew
I apologize but I did not think it was required. I am attaching the DDL scripts and execution plan. Following is the actual query fired:
SELECT "Tbl1002"."OriginState" "Col1006","Tbl1002"."DestinationState" "Col1007",
"Tbl1002"."CarrierDesc" "Col1011","Tbl1002"."Origin" "Col1013",
"Tbl1002"."Destination" "Col1014","Tbl1002"."SupplierFederalID" "Col1017",
"Tbl1002"."SupplierName" "Col1018","Tbl1002"."PostedDate" "Col1019",
"Tbl1002"."BOL#" "Col1023","Tbl1002"."Type" "Col1025",
"Tbl1002"."SumOfGrossGallons" "Col1030","Tbl1002"."SumOfNetGallons" "Col1031",
"Tbl1002"."BillCode" "Col1033","Tbl1002"."CustomerName" "Col1035"
FROM "MOC_REPLICATION"."dbo"."vw_ScheduledReceipts" "Tbl1002"
WHERE ("Tbl1002"."OriginState"='NY' OR "Tbl1002"."DestinationState"='NY')
AND "Tbl1002"."Month"=(4) AND "Tbl1002"."Year"=(2012)
AND "Tbl1002"."CompanyID"=(1)
Thanks
Chandan
May 11, 2012 at 10:16 am
I still don't see any sample data. It's hard to rewrite your view if we don't have data to compare the original results with the revised results.
Also, please pick one method for aliasing columns and stick with it. Your view has both <expression> AS <alias> and <alias> = <expression>. It makes it much harder to read when you're inconsistent.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
May 14, 2012 at 7:32 am
drew.allen (5/11/2012)
I still don't see any sample data. It's hard to rewrite your view if we don't have data to compare the original results with the revised results.Also, please pick one method for aliasing columns and stick with it. Your view has both <expression> AS <alias> and <alias> = <expression>. It makes it much harder to read when you're inconsistent.
Drew
I am afraid that if I put some sample data, I may violate conditions of my client. Thanks for your help and time. Somebody suggested me that the left join can be calculated as a CTE in the begining and that CTE can replace the subquery. but being a DBa, I have limited skills in T-SQL(still learning).
Do you have any advice to offer here.
Regards
Chandan
May 14, 2012 at 7:36 am
chandan_jha18 (5/14/2012)
drew.allen (5/11/2012)
I still don't see any sample data. It's hard to rewrite your view if we don't have data to compare the original results with the revised results.Also, please pick one method for aliasing columns and stick with it. Your view has both <expression> AS <alias> and <alias> = <expression>. It makes it much harder to read when you're inconsistent.
Drew
I am afraid that if I put some sample data, I may violate conditions of my client. Thanks for your help and time. Somebody suggested me that the left join can be calculated as a CTE in the begining and that CTE can replace the subquery. but being a DBa, I have limited skills in T-SQL(still learning).
Do you have any advice to offer here.
Regards
Chandan
My advice is to provide sample data. Sample meaning that it does not contain real data. You cannot possibly violate any confidentiality agreement by making up some simple data.
Jared
CE - Microsoft
May 14, 2012 at 7:46 am
You've got multiple full scans running on millions of rows. Until you make the query filter that data, performance is going to stink and there's almost nothing you can do about it. You've also got some fundamental problems in the code. For example:
CAST(term.TerminalID AS VARCHAR)
Will lead to scans on your index. You can't have functions on columns in WHERE or JOIN criteria. And, if the two data types are different, why are they being used as JOIN criteria? That's not a T-SQL error. That's a design error. I'd fix it at the database structure level first, then worry about the T-SQL. After that, I'm with everyone else, why are you aggregating based on 20 columns? Are you doing that because that is the actual grouping criteria? Or, are you doing that because SQL Server forced you to add those columns to the GROUP BY statement since no aggregations were performed on them? If it's the latter, what is the real aggregate criteria? Make that into a derived table (doesn't need to be a CTE. CTEs are not magic. It can just be a sub-select similar to what you've done with your OUTER JOIN) then join that to the rest of the query.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 14, 2012 at 7:50 am
i think this should give the same results and be quicker
but without data i can't prove it
LEFT OUTER JOIN
(SELECT DsSupcode,DsLocationCode,TerminalNumber,MIN(recid) as recidmin FROM MOC_DT.dbo.TCN_Xref
WHERE
AND (group by TerminalNumber,DsLocationCode,DsSupcode )
) tcn ON tcn.DsLocationCode = CAST(term.TerminalID AS VARCHAR) and sup.SupplierCode LIKE tcn.DsSupcode + '-%'
it's also much cleaner code
MVDBA
May 14, 2012 at 8:20 am
Grant Fritchey (5/14/2012)
You've got multiple full scans running on millions of rows. Until you make the query filter that data, performance is going to stink and there's almost nothing you can do about it. You've also got some fundamental problems in the code. For example:
CAST(term.TerminalID AS VARCHAR)
Will lead to scans on your index. You can't have functions on columns in WHERE or JOIN criteria. And, if the two data types are different, why are they being used as JOIN criteria? That's not a T-SQL error. That's a design error. I'd fix it at the database structure level first, then worry about the T-SQL. After that, I'm with everyone else, why are you aggregating based on 20 columns? Are you doing that because that is the actual grouping criteria? Or, are you doing that because SQL Server forced you to add those columns to the GROUP BY statement since no aggregations were performed on them? If it's the latter, what is the real aggregate criteria? Make that into a derived table (doesn't need to be a CTE. CTEs are not magic. It can just be a sub-select similar to what you've done with your OUTER JOIN) then join that to the rest of the query.
I agree with you Sir. Design made years ago haunt us now and we keep running here and there to keep performance right. If the developers had so much information over correct database designs years ago,like this is today, we DBAs can concentrate on T-SQL too but as you said T-SQL cannot do magic if we are forcing it to compare apples and oranges. It will give us the output because we paid for it but does not guarantee performance.
Thanks
Chandan
May 14, 2012 at 8:21 am
michael vessey (5/14/2012)
i think this should give the same results and be quickerbut without data i can't prove it
LEFT OUTER JOIN
(SELECT DsSupcode,DsLocationCode,TerminalNumber,MIN(recid) as recidmin FROM MOC_DT.dbo.TCN_Xref
WHERE
AND (group by TerminalNumber,DsLocationCode,DsSupcode )
) tcn ON tcn.DsLocationCode = CAST(term.TerminalID AS VARCHAR) and sup.SupplierCode LIKE tcn.DsSupcode + '-%'
it's also much cleaner code
thanks for taking your valuable time to look at it. I can verify the results but wanted to ask if you missed a condition in your where clause before 'AND' operator.
Thanks
Chandan
May 23, 2012 at 6:11 am
this is an old thread but would like to inform that I could resolve this issue by changing the view to a procedure. Thanks to everyone who pitched in with their ideas!
Regards
Chandan
May 24, 2012 at 7:21 am
The left join on the MIN is going to suck in any case, but these are the real killers here I bet:
WHERE DsLocationCode = CAST(term.TerminalID AS VARCHAR)
AND (sup.SupplierCode LIKE DsSupcode + '-%')
the first a) voids the use of an index seek if one is appropriate and available and b) prevents the optimizer from getting good estmiates
the second will also crush the optimizer's ability to get good estimates
Speaking of estimates, there are SOOO many joins here the slightest skew can REALLY mess things up. The OR in the actual view call likewise.
Looks like a Business Objects query, or from some other automated reporting tool. Yucko!! 🙂
Lastly, one could spend tens of hours on something this complex to get it working well. I feel that this goes WAY beyond what one should expect from a free forum. Get yourself a professional tuner to help you out. That would avoid the issue of not being able to show real data too (which will ABSOLUTELY be required to get this one improved). This beast simply cannot be tuned with a few generated rows (which would be EXTRAORDINARILY difficult in any case given the number of tables involved).
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 24, 2012 at 7:36 am
TheSQLGuru (5/24/2012)
The left join on the MIN is going to suck in any case, but these are the real killers here I bet:WHERE DsLocationCode = CAST(term.TerminalID AS VARCHAR)
AND (sup.SupplierCode LIKE DsSupcode + '-%')
the first a) voids the use of an index seek if one is appropriate and available and b) prevents the optimizer from getting good estmiates
the second will also crush the optimizer's ability to get good estimates
Speaking of estimates, there are SOOO many joins here the slightest skew can REALLY mess things up. The OR in the actual view call likewise.
Looks like a Business Objects query, or from some other automated reporting tool. Yucko!! 🙂
Lastly, one could spend tens of hours on something this complex to get it working well. I feel that this goes WAY beyond what one should expect from a free forum. Get yourself a professional tuner to help you out. That would avoid the issue of not being able to show real data too (which will ABSOLUTELY be required to get this one improved). This beast simply cannot be tuned with a few generated rows (which would be EXTRAORDINARILY difficult in any case given the number of tables involved).
I apologize for posting such a long question with complex queries. As far as index scans and cast, convert functions are concerned, even a professional person would have given same advice as you experts did here. Schema is something that gets very very hard to change after few years of its inception. I tried that but was not approved. So I keep looking for some sort of band-aid 🙂
In this case, I noted that my query had a where condition and the view had a where condition built in too. Somehow when sum() function was operating, it did not consider the where clause of the query and ended up doing this operation 1 million times. Then the where filter of the query was applied to filter maximum rows and display only 4k rows. So i suggested them to have a procedure inside which all filters can be fit.
Apparently, the 'where' condition inside view was getting considered first and the 'where' condition of the query was coming into picture at a much much later stage whereas I expected SQL engine to combine both of them.
Do you know why it happened that way?
Regards
Chandan
May 24, 2012 at 8:00 am
chandan_jha18 (5/24/2012)
TheSQLGuru (5/24/2012)
The left join on the MIN is going to suck in any case, but these are the real killers here I bet:WHERE DsLocationCode = CAST(term.TerminalID AS VARCHAR)
AND (sup.SupplierCode LIKE DsSupcode + '-%')
the first a) voids the use of an index seek if one is appropriate and available and b) prevents the optimizer from getting good estmiates
the second will also crush the optimizer's ability to get good estimates
Speaking of estimates, there are SOOO many joins here the slightest skew can REALLY mess things up. The OR in the actual view call likewise.
Looks like a Business Objects query, or from some other automated reporting tool. Yucko!! 🙂
Lastly, one could spend tens of hours on something this complex to get it working well. I feel that this goes WAY beyond what one should expect from a free forum. Get yourself a professional tuner to help you out. That would avoid the issue of not being able to show real data too (which will ABSOLUTELY be required to get this one improved). This beast simply cannot be tuned with a few generated rows (which would be EXTRAORDINARILY difficult in any case given the number of tables involved).
I apologize for posting such a long question with complex queries. As far as index scans and cast, convert functions are concerned, even a professional person would have given same advice as you experts did here. Schema is something that gets very very hard to change after few years of its inception. I tried that but was not approved. So I keep looking for some sort of band-aid 🙂
In this case, I noted that my query had a where condition and the view had a where condition built in too. Somehow when sum() function was operating, it did not consider the where clause of the query and ended up doing this operation 1 million times. Then the where filter of the query was applied to filter maximum rows and display only 4k rows. So i suggested them to have a procedure inside which all filters can be fit.
Apparently, the 'where' condition inside view was getting considered first and the 'where' condition of the query was coming into picture at a much much later stage whereas I expected SQL engine to combine both of them.
Do you know why it happened that way?
Regards
Chandan
Yes, I do. Overly complex query. The optimizer can only do so much, and although it is TRULY AMAZING what it is capable of you are asking a LOT of it. You need to help it out sometimes. Given 15 years of tuning experience you learn all kinds of ways to help it out when required. 😎
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply