February 14, 2010 at 2:38 pm
I’ve written a common table expression that combines complex information on staff movements, allowances paid and salary increments from various tables into one table that correctly shows what a person has been paid in terms of allowances and salary over time. However, due to the way the data is stored in the database (staff movements and allowances are recorded with a start and end date and salary increments are recorded at a point in time), the salary amount appears as a null if an increment doesn’t occur within a staff movement period. The data looks like this:
CREATE TABLE #SalaryHistory
(RowID int, EmployeeID int, SalaryDate datetime, SalaryAmount money)
INSERT INTO #SalaryHistory (RowID, EmployeeID, SalaryDate, SalaryAmount)
SELECT 1, 12, '20050101', 50000 UNION ALL
SELECT 2, 12, '20050720', 52010 UNION ALL
SELECT 3, 12, '20070201', null UNION ALL
SELECT 4, 12, '20080510', 53100 UNION ALL
SELECT 5, 12, '20080612', null UNION ALL
SELECT 6, 12, '20090105', null UNION ALL
SELECT 7, 12, '20090301', null UNION ALL
SELECT 8, 12, '20090405', 56230 UNION ALL
SELECT 9, 34, '20050401', 47000
select * from #SalaryHistory order by 2, 3
drop table #SalaryHistory
But should look like this:
CREATE TABLE #SalaryHistoryComplete
(RowID int, EmployeeID int, SalaryDate datetime, SalaryAmount money)
INSERT INTO #SalaryHistoryComplete (RowID, EmployeeID, SalaryDate, SalaryAmount)
SELECT 1, 12, '20050101', 50000 UNION ALL
SELECT 2, 12, '20050720', 52010 UNION ALL
SELECT 3, 12, '20070201', 52010 UNION ALL
SELECT 4, 12, '20080510', 53100 UNION ALL
SELECT 5, 12, '20080612', 53100 UNION ALL
SELECT 6, 12, '20090105', 53100 UNION ALL
SELECT 7, 12, '20090301', 53100 UNION ALL
SELECT 8, 12, '20090405', 56230 UNION ALL
SELECT 9, 34, '20050401', 47000
select * from #SalaryHistoryComplete order by 2, 3
drop table #SalaryHistoryComplete
What I want to do is to replace null salary amounts with the value that appears in the preceding row and to continue doing this until up until the next row with a value in that column. However, I don’t want it to overwrite a null salary amount for the next employee if they happen to have a null at the start of their salary history. What would be the best way of doing this (preferably not using a cursor)? As I’m a relative newby, would it be possible for a working example of the code to be written for the first sample table above so that I can see how it works? Thanks.
February 14, 2010 at 3:59 pm
I'm pretty sure there will be a more elegant solution posted shortly but meanwhile here's what I came up with:
(Note: once you reviewed the result, change the SELECT line to the UPDATE line and check if this is what you're looking for)
;WITH cte AS -- get RowIds with NULL values for SalaryAmount
(
SELECT RowID,EmployeeID
FROM #SalaryHistory
WHERESalaryAmount IS NULL
)
,
cte2 AS -- get RowId for each NULL value from previous cte with values for SalaryAmount NOT NULL
(
SELECT cte.RowID,cte.EmployeeID,MAX(#SalaryHistory.rowid) AS s
FROM #SalaryHistory
INNER JOIN cte ON cte.EmployeeID=#SalaryHistory.EmployeeID
WHERE #SalaryHistory.SalaryAmount IS NOT NULL
AND #SalaryHistory.RowID<cte.RowID
GROUP BY cte.RowID,cte.EmployeeID
)
,
cte3 AS -- assign SalaryAmount to the RowId's found in cte2
(
SELECT cte2.rowid,sh.salaryamount
FROM cte2
INNER JOIN #SalaryHistory sh ON cte2.s=sh.rowid
)
--UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount
SELECT cte3.*
FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid
SELECT * FROM #SalaryHistory ORDER BY EmployeeID, SalaryDate
February 14, 2010 at 4:42 pm
lmu92 Your code:
UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount
SELECT cte3.*
FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid
Your code kept giving me an error The multi-part identifier "cte3.salaryamount" could not be bound after uncommenting the UPDATE statement
Looks like your cut and paste sort goofed up a slight amount. Had to add a bit of a statement to get it to work -- or am I doing something wrong?
UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount
--=== added code ===
FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid
--===End addition
SELECT cte3.*
FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid
February 14, 2010 at 4:53 pm
My code snippet has the following lines at then end:
--UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount
SELECT cte3.*
FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid
The UPDATE and the SELECT part are alternatives. The way you changed it would leave the UPDATE without any referenced table causing the error.
If you want to run the UPDATE you need to uncomment the UPDATE part and comment out the SELECT part:
UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount
--SELECT cte3.*
FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid
Seems like my description ("change the SELECT line to the UPDATE line") wasn't clear enough... I'm sorry!
February 14, 2010 at 5:07 pm
Imu92
My apologies ... for not being more careful .... anyway I hope the end result is that to poster of the question will understand it better than I did
Again except my apologies for misunderstanding
February 14, 2010 at 5:19 pm
There's no reason to apologize, Ron!!
If there's anybody that made a mistake than it will be me, since I didn't describe the usage of my script well enough so I confused you!
Fortunately, you brought up the issue so I could clarify it and the OP and/or others will (hopefully) benefit from it. Thank you again!
February 14, 2010 at 5:30 pm
Do it all at once and without a triangular join...
CREATE CLUSTERED INDEX IX_SalaryHistory_RowID
ON #SalaryHistory (RowID)
DECLARE @PrevSalaryAmount MONEY
UPDATE #SalaryHistory
SET @PrevSalaryAmount = SalaryAmount = CASE WHEN SalaryAmount >= 0 THEN SalaryAmount ELSE @PrevSalaryAmount END
FROM #SalaryHistory WITH(TABLOCKX)
OPTION (MAXDOP 1)
--Jeff Moden
Change is inevitable... Change for the better is not.
February 14, 2010 at 6:04 pm
If you consider using Jeffs solution (which performs much better than mine) I think it's mandatory to read the Running Total / quirky update article[/url] including the discussion.
I think it's really important to understand how it works and why Jeff posted the code the way he did (including clustered index, table lock and MAXDOP option).
Jeff Moden (2/14/2010)
Do it all at once and without a triangular join...
CREATE CLUSTERED INDEX IX_SalaryHistory_RowID
ON #SalaryHistory (RowID)
DECLARE @PrevSalaryAmount MONEY
UPDATE #SalaryHistory
SET @PrevSalaryAmount = SalaryAmount = CASE WHEN SalaryAmount >= 0 THEN SalaryAmount ELSE @PrevSalaryAmount END
FROM #SalaryHistory WITH(TABLOCKX)
OPTION (MAXDOP 1)
February 14, 2010 at 6:15 pm
@ Jeff: at a second thought I kinda disagree with your clustered index...
Shouldn't it be on (EmployeeID, SalaryDate) instead of RowID?
What would happen with rows like:
SELECT 1, 12, '20050101', 50000 UNION ALL
SELECT 2, 12, '20050720', 52010 UNION ALL
SELECT 3, 12, '20070201', null UNION ALL
SELECT 4, 12, '20080510', 53100 UNION ALL
SELECT 5, 12, '20080612', null UNION ALL
SELECT 6, 34, '20050401', 47000 UNION ALL
SELECT 7, 12, '20090301', null UNION ALL
SELECT 8, 12, '20090405', 56230 UNION ALL
SELECT 9, 12, '20090105', null
February 14, 2010 at 6:52 pm
lmu92 (2/14/2010)
@ Jeff: at a second thought I kinda disagree with your clustered index...Shouldn't it be on (EmployeeID, SalaryDate) instead of RowID?
What would happen with rows like:
SELECT 1, 12, '20050101', 50000 UNION ALL
SELECT 2, 12, '20050720', 52010 UNION ALL
SELECT 3, 12, '20070201', null UNION ALL
SELECT 4, 12, '20080510', 53100 UNION ALL
SELECT 5, 12, '20080612', null UNION ALL
SELECT 6, 34, '20050401', 47000 UNION ALL
SELECT 7, 12, '20090301', null UNION ALL
SELECT 8, 12, '20090405', 56230 UNION ALL
SELECT 9, 12, '20090105', null
I used the RowID based on the fact that it's a temp table and it appears to be correctly sorted by EmployeeID and SalaryDate... but, of course, your suggestion is much more correct.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 15, 2010 at 12:29 am
Thanks for the advice. It's been really helpful.
February 15, 2010 at 7:56 am
RBAR comes through again, good work.
Requirement added: However, I don’t want it to overwrite a null salary amount for the next employee if they happen to have a null at the start of their salary history.
CREATE TABLE #SalaryHistory
(RowID int, EmployeeID int, SalaryDate datetime, SalaryAmount money)
INSERT INTO #SalaryHistory (RowID, EmployeeID, SalaryDate, SalaryAmount)
SELECT 1, 12, '20050101', null UNION ALL
SELECT 2, 12, '20050720', 52010 UNION ALL
SELECT 3, 12, '20070201', null UNION ALL
SELECT 4, 12, '20080510', 53100 UNION ALL
SELECT 5, 12, '20080612', null UNION ALL
SELECT 6, 12, '20090105', null UNION ALL
SELECT 7, 12, '20090301', null UNION ALL
SELECT 8, 12, '20090405', 56230 UNION ALL
SELECT 9, 34, '20050401', 47000 UNION ALL
SELECT 10, 52, '20050101', null UNION ALL
SELECT 11, 52, '20050720', 52010 UNION ALL
SELECT 12, 52, '20070201', null UNION ALL
SELECT 13, 52, '20080510', 53100 UNION ALL
SELECT 14, 52, '20080612', null UNION ALL
SELECT 15, 52, '20090105', null UNION ALL
SELECT 16, 52, '20090301', null UNION ALL
SELECT 17, 52, '20090405', 56230 UNION ALL
SELECT 18, 54, '20050401', 47000
SELECT * FROM #SalaryHistory;
CREATE CLUSTERED INDEX IX_SalaryHistory_RowID
ON #SalaryHistory (EmployeeID, SalaryDate)
DECLARE @PrevSalaryAmount MONEY, @PrevEmployeeID int
SET @PrevSalaryAmount = NULL SET @PrevEmployeeID = 0
UPDATE #SalaryHistory
SET @PrevSalaryAmount = SalaryAmount =
CASE WHEN SalaryAmount IS NULL AND EmployeeID = @PrevEmployeeID
THEN @PrevSalaryAmount
ELSE SalaryAmount
END,
@PrevEmployeeID = EmployeeID
FROM #SalaryHistory WITH(TABLOCKX)
OPTION (MAXDOP 1);
SELECT * FROM #SalaryHistory;
😉
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply