February 24, 2016 at 7:17 pm
Hi All,
I've inherited this stored proc and I'm trying to improve the performance of it. I really hate a couple of things about the way it was written.
First, I hate that it's broken into two inserts rather than one insert with the selects done as a union between the two selects.
Second, I hate that each query uses two other subqueries. I'm thinking of changing that to either CTEs or possibly temp tables. I'm just not sure which is the better way to go, so I'm asking the SQL community at large.
CREATE PROCEDURE [stg].[p_ads_Staging_ScanSummary_Load] AS
BEGIN
TRUNCATE TABLE ADS.STG.Staging_ScanSummary
INSERT INTO ADS.STG.Staging_ScanSummary
SELECT
Region,
Company,
DepartmentCode,
ModalityID,
CustomerNbr,
Asset_id,
Servicedate,
Date_ID,
PostDate,
[retail_scan_count],
'0' AS Wholesale_scan_count,
'0' AS wbsscancount,
[retail_scan_count] AS Nbrscans,
'Retail' as Loadsource
FROM (
SELECT
Region,
Company,
DepartmentCode,
ModalityID,
CustomerNbr,
Asset_id,
Servicedate,
Date_ID,
PostDate,
sum([retail_scan_count]) AS [retail_scan_count]
FROM
(SELECT
o.sublatcode AS Region,
C.CompanyCode AS Company,
D.DepartmentCode AS DepartmentCode,
Isnull(M.ModalityID,CSN.ModalityID) AS ModalityID,
CONVERT(VARCHAR(50),IGY.CustNmbr) AS CustomerNbr,
ISNULL(CA.UnitNbr,'0000') AS Asset_id,
IGY.DOS AS Servicedate,
(SELECT P.DateID FROM ADW.dim.Period P WHERE P.[Date] = IGY.DOS) AS Date_ID,
(SELECT P.DateID FROM ADW.dim.Period P WHERE P.[Date] = IGY.PostDate) AS PostDate,
IGY.[ScanCount] AS [retail_scan_count]
FROM[ADS].[dbo].[Intergy_RetailRevenueDetailAINEHE] IGY
JOIN ADW.DIM.CustomerServiceNumber CSN
ON CONVERT(VARCHAR(50),IGY.CustNmbr) = CSN.CustomerServiceNumber
JOIN ADW.dim.Company C
ON CSN.CompanyID = C.CompanyID
LEFT JOIN ADW.dim.Modality M
ON IGY.modality = M.modalitycode
JOIN ADW.dim.Department D
ON D.DepartmentID = CASE WHEN M.departmentID is null then (select MO.departmentID from adw.dim.Modality MO where MO.ModalityID = csn.ModalityID) else
M.departmentID end
JOIN ADW.DIM.Period P
ON IGY.DOS = P.[Date]
JOIN ADW.dim.orgunit o
ON o.orgunitid=CSN.orgunitid
LEFT JOIN ADW.dim.v_CurrentAsset CA
ON CA.UnitNbr = IGY.Unit
WHERE CSN.Is_Current = 1
) SUBQUERY
group by Region,Company,DepartmentCode,ModalityID,CustomerNbr,Asset_id,Servicedate,Date_ID,PostDate
HAVING SUM(ISNULL([retail_scan_count],0)) <> 0)INTERGYOUTPUT
INSERT INTO ADS.STG.Staging_ScanSummary
SELECT
Region,
Company,
DepartmentCode,
ModalityID,
CustomerNbr,
Asset_id,
Servicedate,
Date_ID,
PostDate,
'0' AS [retail_scan_count],
WholeSaleScanCount,
WBScanCount,
WholeSaleScanCount AS Nbrscans,
'Wholesale' as Loadsource
FROM (
SELECT
Region,
Company,
DepartmentCode,
ModalityID,
ServiceNbr AS CustomerNbr,
UnitSegment AS Asset_id,
Servicedate,
Date_ID,
PostDate,
SUM(WholeSaleScanCount) as WholeSaleScanCount,
SUM(WBScanCount)as WBScanCount
FROM(
select
OU.SubLatCode AS Region,
C.CompanyCode AS Company,
D.DepartmentCode,
ISNull(M.ModalityID,CSN.ModalityID) AS ModalityID,
SO.ServiceNbr,
ISNULL(CA.UnitNbr,'0000') AS UnitSegment,
SO.Servicedate,
(SELECT P.DateID FROM ADW.dim.Period P WHERE P.[Date] = SO.ServiceDate) AS Date_ID,
(SELECT P.DateID FROM ADW.dim.Period P WHERE P.[Date] = SO.PostDate) AS PostDate,
SO.ScanCount as WholeSaleScanCount,
SO.WBScanCountas WBScanCount
from [ADS].[dbo].[ScansODS] SO
JOIN ADW.dim.CustomerServiceNumber CSN
ON SO.ServiceNbr = CSN.CustomerServiceNumber
JOIN ADW.DIM.OrgUnit OU
ON CSN.OrgUnitID = OU.OrgUnitID
join ADW.dim.Company C
ON CSN.CompanyID = C.CompanyID
LEFT join ADW.DIM.Modality M
ON SO.ModalityCode = M.ModalityCode
JOIN ADW.dim.Department D
ON D.DepartmentID = CASE WHEN M.departmentID is null then (select MO.departmentID from adw.dim.Modality MO where MO.ModalityID = csn.ModalityID) else
M.departmentID end
JOin ADW.DIM.Period P
ON SO.Servicedate = P.[date]
LEFT JOIN ADW.dim.v_CurrentAsset CA
ON CA.UnitNbr = SO.UnitSegment
Where CSN.Is_Current = 1 AND
SO.ScanType= 'Wholesale'
)SUBQUERY2
group by Region,Company,DepartmentCode,ModalityID,ServiceNbr,UnitSegment,Servicedate,Date_ID,PostDate)SCANOUTPUT
END
February 24, 2016 at 8:15 pm
First, you need to replace this strange piece of code:
LEFT join ADW.DIM.Modality M
ON SO.ModalityCode = M.ModalityCode
JOIN ADW.dim.Department D
ON D.DepartmentID = CASE WHEN M.departmentID is null then (select MO.departmentID from adw.dim.Modality MO where MO.ModalityID = csn.ModalityID) else
M.departmentID end
with this one:
LEFT join ADW.DIM.Modality M ON SO.ModalityCode = M.ModalityCode
LEFT JOIN adw.dim.Modality MO ON MO.ModalityID = csn.ModalityID
INNER JOIN ADW.dim.Department D ON D.DepartmentID = ISNULL(M.departmentID, MO.departmentID )
For the rest - as usual, please post table definitions and the current execution plan.
_____________
Code for TallyGenerator
February 24, 2016 at 8:21 pm
Oh yeah, and this:
(SELECT P.DateID FROM ADW.dim.Period P WHERE P.[Date] = IGY.DOS) AS Date_ID,
(SELECT P.DateID FROM ADW.dim.Period P WHERE P.[Date] = IGY.PostDate) AS PostDate,
IGY.[ScanCount] AS [retail_scan_count]
FROM[ADS].[dbo].[Intergy_RetailRevenueDetailAINEHE] IGY
replace with this:
DOS.P.DateID AS Date_ID, PD.DateID AS PostDate,
IGY.[ScanCount] AS [retail_scan_count]
FROM[ADS].[dbo].[Intergy_RetailRevenueDetailAINEHE] IGY
LEFT JOIN ADW.dim.Period DOS ON DOS.[Date] = IGY.DOS
LEFT JOIN ADW.dim.Period PD ON PD.[Date] = IGY.PostDate
_____________
Code for TallyGenerator
February 24, 2016 at 10:49 pm
Seems to me that you are using Dimension table from your DW and joining them with a table / view of a staging table. So means no direct relation with the staging table. all to have to use the natural key to join with the table.
Few things to be in your mind when u do that:
1. its more like a ETL where L belongs to "Loading") so when you are preparing your staging better remember you have indexes on the joining table. Its feels like most of the queries would be using Table Scan behind the scene. Check query plan for this query. See what is missing and you should be adding those indexes.
2. Avoid sub queries like Sergiy suggested above. They are RBAR means they will execute for each row and if you have rows in millions then you can imagine what would be the story behind.
3. Avoid using scalar functions in the JOIN Clause and conditional Join are also performance killer.
4. Its not a bad thing to load data in from two different data sources and inserting them into same destination. In ETL you do it many times. As long as performance is not an issue you shouldn't be changing that.
5. Its always better to Pre -Aggregate the data before joining them. It give a better performance as there will be less rows to be processed. So check if you can do that before joining the table.
Hope it helps.
February 25, 2016 at 1:38 am
Sergiy (2/24/2016)
First, you need to replace this strange piece of code:
LEFT join ADW.DIM.Modality M
ON SO.ModalityCode = M.ModalityCode
JOIN ADW.dim.Department D
ON D.DepartmentID = CASE WHEN M.departmentID is null then (select MO.departmentID from adw.dim.Modality MO where MO.ModalityID = csn.ModalityID) else
M.departmentID end
with this one:
LEFT join ADW.DIM.Modality M ON SO.ModalityCode = M.ModalityCode
LEFT JOIN adw.dim.Modality MO ON MO.ModalityID = csn.ModalityID
INNER JOIN ADW.dim.Department D ON D.DepartmentID = ISNULL(M.departmentID, MO.departmentID )
For the rest - as usual, please post table definitions and the current execution plan.
Thanks, I'll give those a try. Here's the execution plan:
February 25, 2016 at 1:44 am
twin.devil (2/24/2016)
Seems to me that you are using Dimension table from your DW and joining them with a table / view of a staging table. So means no direct relation with the staging table. all to have to use the natural key to join with the table.Few things to be in your mind when u do that:
1. its more like a ETL where L belongs to "Loading") so when you are preparing your staging better remember you have indexes on the joining table. Its feels like most of the queries would be using Table Scan behind the scene. Check query plan for this query. See what is missing and you should be adding those indexes.
2. Avoid sub queries like Sergiy suggested above. They are RBAR means they will execute for each row and if you have rows in millions then you can imagine what would be the story behind.
3. Avoid using scalar functions in the JOIN Clause and conditional Join are also performance killer.
4. Its not a bad thing to load data in from two different data sources and inserting them into same destination. In ETL you do it many times. As long as performance is not an issue you shouldn't be changing that.
5. Its always better to Pre -Aggregate the data before joining them. It give a better performance as there will be less rows to be processed. So check if you can do that before joining the table.
Hope it helps.
Thanks, these are some good suggestions. As I didn't write the initial SP, I just inherited it today I didn't have any ability to avoid subqueries or scalar functions in the joins etc...
With regards to your #4, would it make more sense to do the selects as a union and only do the insert once vs doing two inserts as it currently is?
The other avenue I'm thinking of going is using the select(s) as a data source in SSIS and using that to do the insert.
February 25, 2016 at 9:27 pm
Okay for some reason the link to the execution plan didn't come out right. Here it is again
February 25, 2016 at 11:22 pm
Its not an issue really to inherit SP most of the guys have worked on inherited code. The thing what you are willing to make it better. So i would say this is an opportunity for you to get your hand dirty. As per your query plan the major Hit you are getting due to table scan like i shared earlier and 2ndly you are getting hit by missing indexes, and third you are using conditional joins.
My question would be have you tried anything yet at your end so far as per suggested earlier?
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply