April 9, 2013 at 3:23 am
Hi all
I have a proc here where the estimated amount of rows is around 2 trillion where the actual numbers of rows is 300. I have updated statistics on the relevant tables to no avail, there seems to be a hash match at the time when the rows go miles out of synch.
Does anyone have nay suggestions? Currently there are no indexes on the temp tables perhaps that would help I am unsure?
I have attached the plan and the code is below.
Thanks for any input
DECLARE
@paramCompanyvarchar(3),
@paramStartDatedatetime,
@paramEndDatedatetime,
@paramSalesOfficevarchar(2000),
@paramSalesResponsiblevarchar(max),
@paramQwertyvarchar(2)
SET @paramCompany= @Company
SET @paramStartDate= @StartDate
SET @paramEndDate= @EndDate
SET @paramSalesOffice= @SalesOffice
SET @paramSalesResponsible= @SalesResponsible
SET @paramQwerty= @qwerty-2
SELECT
svcitem.DESPATCHPOINT,
svcitem.DATAAREAID,
svcitem.ITEMID
INTO #d1
FROM
[BIDATAWSQL].DynamicsV5Realtime.dbo.MARDESPATCHSERVICEITEMS svcitem
INNER JOIN
[BIDATAWSQL].DynamicsV5Realtime.dbo.marreportenums enum
ON
svcitem.MARSALESCHARGEITEMTYPE = enum.ENUMVALUEINT
WHERE
enum.ENUMNAME = 'MARSalesChargeItemType'
and enum.ENUMVALUESTR = 'Excess Haulage'
and svcitem.DATAAREAID = @paramCompany
-- to get Original Sales Orders that have had an RTW order create within the date parameters
SELECT DISTINCT
st.MARIMSSALESID,
st.DATAAREAID
INTO #d2
FROM
[BIDATAWSQL].DynamicsV5Realtime.dbo.SALESTABLE st --- RTW Order
WHERE
st.DATAAREAID = @paramCompany
and st.SALESTYPE = 4 --Returned Order
and st.CUSTACCOUNT not like('K' + '%')
and st.CUSTACCOUNT not like('E' + '%')
and (CAST(FLOOR(CAST(st.CREATEDDATETIME AS FLOAT))AS DATETIME) between @paramStartDate and @paramEndDate)
SELECT
st.MARIMSSALESID as 'OriginalOrder',
st2.MARTRANSPORTGROUP as 'DespatchPoint',
st2.MARVEHICLETYPE as 'OriginalOrderVehicleType',
origwt.OriginalOrderWeight,
isnull(servchrg.OriginalServiceCharges,0) as 'OriginalServiceCharges',
CASE WHEN origwt.OriginalOrderWeight = ABS(SUM((sl.QTYORDERED * it.NETWEIGHT) / 1000))
THEN isnull(servchrg.OriginalServiceCharges,0) ELSE 0 END as 'ReturnableServiceCharges',
st.SALESID as 'RTWOrder',
st.MARSALESOFFICE as 'RTWSalesOffice',
st.MARVEHICLETYPE as 'RTWOrderVehicleType',
pm.name as 'RTWSalesResponsible',
st.CREATEDDATETIME as 'RTWOrderCreated',
st.DELIVERYCITY + ' - ' + st.DELIVERYZIPCODE as 'SiteTown',
dbo.udf_get_PricingArea(st.DELIVERYZIPCODE,#d1.ITEMID) as 'PricingArea',
st.MARTRANSPORTGROUP as 'Returnto',
ABS(SUM((sl.QTYORDERED * it.NETWEIGHT) / 1000)) as 'RTWWeight', -- in tons
ho.SALESID as 'HaulageOrder',
sl1.ITEMID as 'Item',
sl1.SALESQTY as 'Qty',
sl1.LINEAMOUNT as 'HaulageCharged',
CASE WHEN st.MARVEHICLETYPE = '0' or st2.MARVEHICLETYPE = '0' THEN 1 ELSE 2 END as 'haulcalcfactor',
coalesce(dbo.udf_get_YardPrice(st.DELIVERYZIPCODE,#d1.ITEMID), dbo.udf_get_SitePrice(st.DELIVERYZIPCODE,#d1.ITEMID)) as 'ListPrice',
coalesce(dbo.udf_get_YardPrice(st.DELIVERYZIPCODE,#d1.ITEMID), dbo.udf_get_SitePrice(st.DELIVERYZIPCODE,#d1.ITEMID))
* ABS(SUM((sl.QTYORDERED * it.NETWEIGHT) / 1000))
* (CASE WHEN st.MARVEHICLETYPE = '0' or st2.MARVEHICLETYPE = '0' THEN 1 ELSE 2 END) as 'ExpectedHaulage', --ListPrice * RTWWeight * 2 or 1 now - see CASE statement
#d1.ITEMID as 'ExcessHaulageItem',
pm.emplid as 'EmplId',
IsNull(pm.email,'gill.wilton.@marshalls.co.uk') as 'SalesResponsibleEmail'
INTO #d3
FROM
[BIDATAWSQL].DynamicsV5Realtime.dbo.SALESTABLE st --- RTW Order
LEFT OUTER JOIN
udf_R000_EmployeeList() pm
ON
st.SALESRESPONSIBLE = pm.emplid
INNER JOIN
[BIDATAWSQL].DynamicsV5Realtime.dbo.SALESLINE sl
ON
sl.SALESID = st.SALESID
and sl.DATAAREAID = st.DATAAREAID
INNER JOIN
#d2
ON
#d2.MARIMSSALESID = st.MARIMSSALESID
and #d2.DATAAREAID = st.DATAAREAID
INNER JOIN
[BIDATAWSQL].DynamicsV5Realtime.dbo.INVENTTABLE it
ON
it.ITEMID = sl.ITEMID
and it.DATAAREAID = sl.DATAAREAID
LEFT OUTER JOIN
[BIDATAWSQL].DynamicsV5Realtime.dbo.INVENTDIM id
ON
id.INVENTDIMID = sl.INVENTDIMID
and id.DATAAREAID = sl.DATAAREAID
INNER JOIN
(SELECT
st.SALESID,
st.MARIMSSALESID,
st.DATAAREAID,
st.CREATEDDATETIME
FROM
[BIDATAWSQL].DynamicsV5Realtime.dbo.SALESTABLE st
INNER JOIN
[BIDATAWSQL].DynamicsV5Realtime.dbo.SALESLINE sl
ON
sl.SALESID = st.SALESID
and sl.DATAAREAID = st.DATAAREAID
WHERE
st.DATAAREAID = @paramCompany
and st.CUSTACCOUNT not like('K' + '%')
and st.CUSTACCOUNT not like('E' + '%')
and sl.ITEMID = 'JZEX111RTW0'
) ho -- Haulage Order
ON
ho.MARIMSSALESID = st.MARIMSSALESID
and ho.DATAAREAID = st.DATAAREAID
and ho.SALESID <> st.SALESID
INNER JOIN
[BIDATAWSQL].DynamicsV5Realtime.dbo.SALESLINE sl1
ON
sl1.SALESID = ho.SALESID
and sl1.DATAAREAID = ho.DATAAREAID
and sl1.ITEMID = 'JZEX111RTW0'
INNER JOIN
[BIDATAWSQL].DynamicsV5Realtime.dbo.SALESTABLE st1
ON
st1.SALESID = sl1.SALESID
and st1.DATAAREAID = sl1.DATAAREAID
LEFT OUTER JOIN
[BIDATAWSQL].DynamicsV5Realtime.dbo.SALESTABLE st2
ON
st2.SALESID = st.MARIMSSALESID
and st2.DATAAREAID = st.DATAAREAID
-- to get total value of original order service charges
LEFT OUTER JOIN
(SELECT
sl.SALESID,
sl.DATAAREAID,
SUM(sl.SALESQTY * sl.SALESPRICE) as 'OriginalServiceCharges'
FROM
[BIDATAWSQL].DynamicsV5Realtime.dbo.SALESLINE sl
WHERE
sl.ITEMID like ('JZEX' + '%')
GROUP BY
sl.SALESID,
sl.DATAAREAID) servchrg
ON
servchrg.SALESID = st2.SALESID
and servchrg.DATAAREAID = st2.DATAAREAID
-- to get net weight of original order items sent
LEFT OUTER JOIN
(SELECT
sl.SALESID,
sl.DATAAREAID,
ABS(SUM(sl.SALESQTY * it.NETWEIGHT / 1000)) as 'OriginalOrderWeight'
FROM
[BIDATAWSQL].DynamicsV5Realtime.dbo.SALESLINE sl
INNER JOIN
[BIDATAWSQL].DynamicsV5Realtime.dbo.INVENTTABLE it
ON
it.ITEMID = sl.ITEMID
and it.DATAAREAID = sl.DATAAREAID
WHERE
sl.ITEMID not like ('JZEX' + '%')
GROUP BY
sl.SALESID,
sl.DATAAREAID) origwt
ON
origwt.SALESID = st2.SALESID
and origwt.DATAAREAID = st2.DATAAREAID
LEFT OUTER JOIN
#d1
ON
#d1.DESPATCHPOINT = st2.MARTRANSPORTGROUP
and #d1.DATAAREAID = st2.DATAAREAID
WHERE
st.DATAAREAID = @paramCompany
and st.SALESTYPE = 4 --Returned Order
and st.CUSTACCOUNT not like('K' + '%')
and st.CUSTACCOUNT not like('E' + '%')
and st1.MARVEHICLETYPE = '100' --DUMMY T.M.G. (ADMINISTRATION)
--and st1.MARFOEORDERTYPE = 0 -- commented out for Gill to test and advise ???
and (st.MARSALESOFFICE in(select * from udf_MultiValueParameterHandlingString(@paramSalesOffice)) or @paramSalesOffice = 'All')
and (pm.emplid IN (select * from udf_MultiValueParameterHandlingString(@paramSalesResponsible)) or @paramSalesResponsible = 'All')
GROUP BY
st.SALESID,
st.MARIMSSALESID,
st2.MARTRANSPORTGROUP,
st.DELIVERYCITY + ' - ' + st.DELIVERYZIPCODE,
st.MARTRANSPORTGROUP,
ho.SALESID,
sl1.ITEMID,
sl1.SALESQTY,
sl1.LINEAMOUNT,
st.DELIVERYZIPCODE,
#d1.ITEMID,
st.CREATEDDATETIME,
servchrg.OriginalServiceCharges,
origwt.OriginalOrderWeight,
st.MARVEHICLETYPE ,
st2.MARVEHICLETYPE,
pm.name,
st.MARSALESOFFICE,
pm.emplid,
IsNull(pm.email,'gill.wilton.@marshalls.co.uk')
SELECT
#d3.OriginalOrder,
#d3.DespatchPoint,
#d3.OriginalOrderVehicleType,
#d3.OriginalOrderWeight,
#d3.OriginalServiceCharges,
#d3.ReturnableServiceCharges,
#d3.RTWOrder,
#d3.RTWSalesOffice,
#d3.RTWOrderVehicleType,
#d3.RTWSalesResponsible,
#d3.RTWOrderCreated,
#d3.SiteTown,
#d3.PricingArea,
#d3.Returnto,
#d3.RTWWeight,
#d3.HaulageOrder,
#d3.Item,
#d3.Qty,
#d3.HaulageCharged,
#d3.ListPrice,
#d3.ExpectedHaulage,
#d3.ExcessHaulageItem,
#d3.HaulageCharged - #d3.ExpectedHaulage as 'OldValDiff',
(#d3.HaulageCharged - #d3.ExpectedHaulage) / (#d3.ExpectedHaulage) * 100 as 'OldPercentDiff',
#d3.HaulageCharged - #d3.ExpectedHaulage - #d3.ReturnableServiceCharges as 'ValDiff',
(#d3.HaulageCharged - #d3.ExpectedHaulage - #d3.ReturnableServiceCharges) / (#d3.ExpectedHaulage + #d3.ReturnableServiceCharges) * 100 as 'PercentDiff',
#d3.haulcalcfactor,
#d3.EmplId,
#d3.SalesResponsibleEmail
INTO #d4
FROM
#d3
WHERE
(#d3.HaulageCharged - #d3.ExpectedHaulage - #d3.ReturnableServiceCharges) / (#d3.ExpectedHaulage + #d3.ReturnableServiceCharges) * 100 <= -1 --< -0.05
IF @paramQwerty = 'D' SELECT * FROM #d4
IF @paramQwerty = 'S' SELECT DISTINCT #d4.EmplId, #d4.SalesResponsibleEmail FROM #d4
ORDER BY
1
DROP TABLE #d1
DROP TABLE #d2
DROP TABLE #d3
DROP TABLE #d4
GO
April 9, 2013 at 4:28 am
" ... Thanks for any input ..."
I hate "Order by 1" type code. Why not put the column name so it's clear ?
And no comments and meaningless temp table names ... Very frustrating for anyone else that has to work on it.
April 9, 2013 at 5:00 am
yes I am aware its a big of a nightmare this is what happens when BAs write code. I am currently tearing my hair out re-writing it.
April 9, 2013 at 5:15 am
It looks like the query is accessing tables on a different server.
Unlike the local server, distribution statistics from the remote server will not be returned if the user does not have permission in the remote server(eg. sysadmin, db_owner) to retrieve the statistics distribution results, and will use default assumptions.
April 9, 2013 at 6:04 am
An index on #d2 supporting this join ([#d2].MARIMSSALESID) = ([DynamicsV5Realtime].[dbo].[SALESTABLE].MARIMSSALESID)
would reduce the exaggerated estimated row counts, but that's besides the point. This lot is unlikely to run well unless the whole caboodle is run remotely. I'd recommend you create a reporting db on server [BIDATAWSQL] and package these queries into a stored procedure in it. Not only would it perform far better than it does now, it would be a heck of a lot easier to perform the remaining tuning. You can only go so far with it as it is.
If your udf's, particularly udf_R000_EmployeeList, aren't already configured as inline table-valued functions, then consider recoding them.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 9, 2013 at 6:32 am
You can put index on the columns which are using in your WHERE, ON clause ........
_______________________________________________________________
To get quick answer follow this link:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
April 10, 2013 at 3:43 am
ChrisM@Work (4/9/2013)
An index on #d2 supporting this join([#d2].MARIMSSALESID) = ([DynamicsV5Realtime].[dbo].[SALESTABLE].MARIMSSALESID)
would reduce the exaggerated estimated row counts, but that's besides the point. This lot is unlikely to run well unless the whole caboodle is run remotely. I'd recommend you create a reporting db on server [BIDATAWSQL] and package these queries into a stored procedure in it. Not only would it perform far better than it does now, it would be a heck of a lot easier to perform the remaining tuning. You can only go so far with it as it is.If your udf's, particularly udf_R000_EmployeeList, aren't already configured as inline table-valued functions, then consider recoding them.
Hi all and thanks for the input. I will check out ChrisM's suggestion in an attempt to maybe fix the hash match issue. As for your suggestion on creating a report DB, that is actually what we have however the tester was pointing from a test box to a live server. He was doing this in SSRS and the code was failing validation when executing, which I believe was being caused by the large amount of estimated rows.
April 10, 2013 at 9:35 am
This article by Benjamin Nevarez confirms what @sharky pointed out - if the user isn't the owner of the table or doesn't belong to the sysadmin, db_owner, or db_ddladmin roles on the remote server, the optimizer on the local server cannot get statistics for that data on the remote server. Lack of statistics almost always results in a suboptimal query plan.
As @ChrisM@Work suggested, this query likely will perform a lot better if all the work to get the data from the remote server happens on the remote server. When a query references only tables on a remote server, SQL Server is smart enough to package the whole thing up and send it to the remote server for execution there. When a query joins local tables and remote tables, though, the optimizer has to decide whether it's better to request the data from the remote server and perform the join locally or to send the local data to the remote server to execute the join there and return the results. Based on the execution plan you provided, it looks like SQL Server is requesting the data from the remote machine and performing joins locally because of the joins to #d1 and #d2 in the portion of your proc that populates #d3. Without good statistics on the data, whether from the remote server
One thing you didn't mention in your post, though, is how you perceive that the erroneous estimates are affecting the actual execution of the proc. Usually, execution plans tank when the actual row counts exceed the estimated row counts because the resources allocated and operations chosen for the execution are inadequate to deal with the larger actual row counts. In your case, the actual row counts are much, much smaller than the estimates - hundreds or thousands rather than billions. I wouldn't expect this to drag down the performance of this query too much - it may run slower than necessary, but maybe on the order of a few seconds rather than the hours and hours that can result from underestimated row counts. I might expect to see the performance of other queries on the server suffer, though, because this query will have much-larger-than-necessary memory grants and other resources allocated to it. Is thing the thing running unacceptably slowly, or are you just trying to understand the discrepancy between the estimates and actual row counts?
Jason Wolfkill
April 10, 2013 at 9:41 am
Yes thats right I was really just trying to understand the differences between the est. and actual number of rows, and the explanations you have given go some way to do that.
In fact the reason I was looking at this at all was because in SSRS the code validation fails on a subscription yet the subscription runs fine. I am currently looking into the use of tmp tables when validating code this way, as I believe the issue may lie in that.
Thanks
April 10, 2013 at 9:58 am
Kwisatz78 (4/10/2013)
Yes thats right I was really just trying to understand the differences between the est. and actual number of rows, and the explanations you have given go some way to do that.In fact the reason I was looking at this at all was because in SSRS the code validation fails on a subscription yet the subscription runs fine. I am currently looking into the use of tmp tables when validating code this way, as I believe the issue may lie in that.
Thanks
Do you mean the T-SQL generates a syntax error when parsed or that a run-time error occurs on execution when SSRS calls the stored proc? Using temp tables in the stored proc shouldn't be a problem. The first thing I always check in this situation is whether the credentials passed to the remote server have the appropriate permissions there. Linked server configurations default to using the security credentials of the current local login, but you can map a local login to a different set of credentials to use on the remote server. What credentials is your SSRS report running with, what credentials does the linked server connection pass to the remote server, and are the necessary permissions present in both places?
Jason Wolfkill
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply