How do I pass a value in a field to the next row where the field is null in one or more in the following rows

Question

How do I pass a value in a field to the next row where the field is null in one or more in the following rows

AndyAus

SSC Enthusiast

Points: 100
More actions
February 14, 2010 at 2:38 pm

#386311

I’ve written a common table expression that combines complex information on staff movements, allowances paid and salary increments from various tables into one table that correctly shows what a person has been paid in terms of allowances and salary over time. However, due to the way the data is stored in the database (staff movements and allowances are recorded with a start and end date and salary increments are recorded at a point in time), the salary amount appears as a null if an increment doesn’t occur within a staff movement period. The data looks like this:
CREATE TABLE #SalaryHistory
(RowID int, EmployeeID int, SalaryDate datetime, SalaryAmount money)
INSERT INTO #SalaryHistory (RowID, EmployeeID, SalaryDate, SalaryAmount)
SELECT 1, 12, '20050101', 50000 UNION ALL
SELECT 2, 12, '20050720', 52010 UNION ALL
SELECT 3, 12, '20070201', null UNION ALL
SELECT 4, 12, '20080510', 53100 UNION ALL
SELECT 5, 12, '20080612', null UNION ALL
SELECT 6, 12, '20090105', null UNION ALL
SELECT 7, 12, '20090301', null UNION ALL
SELECT 8, 12, '20090405', 56230 UNION ALL
SELECT 9, 34, '20050401', 47000
select * from #SalaryHistory order by 2, 3
drop table #SalaryHistory
But should look like this:
CREATE TABLE #SalaryHistoryComplete
(RowID int, EmployeeID int, SalaryDate datetime, SalaryAmount money)
INSERT INTO #SalaryHistoryComplete (RowID, EmployeeID, SalaryDate, SalaryAmount)
SELECT 1, 12, '20050101', 50000 UNION ALL
SELECT 2, 12, '20050720', 52010 UNION ALL
SELECT 3, 12, '20070201', 52010 UNION ALL
SELECT 4, 12, '20080510', 53100 UNION ALL
SELECT 5, 12, '20080612', 53100 UNION ALL
SELECT 6, 12, '20090105', 53100 UNION ALL
SELECT 7, 12, '20090301', 53100 UNION ALL
SELECT 8, 12, '20090405', 56230 UNION ALL
SELECT 9, 34, '20050401', 47000
select * from #SalaryHistoryComplete order by 2, 3
drop table #SalaryHistoryComplete
What I want to do is to replace null salary amounts with the value that appears in the preceding row and to continue doing this until up until the next row with a value in that column. However, I don’t want it to overwrite a null salary amount for the next employee if they happen to have a null at the start of their salary history. What would be the best way of doing this (preferably not using a cursor)? As I’m a relative newby, would it be possible for a working example of the code to be written for the first sample table above so that I can see how it works? Thanks.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

I'm pretty sure there will be a more elegant solution posted shortly but meanwhile here's what I came up with:

(Note: once you reviewed the result, change the SELECT line to the UPDATE line and check if this is what you're looking for)

;WITH cte AS -- get RowIds with NULL values for SalaryAmount

(

SELECT RowID,EmployeeID

FROM #SalaryHistory

WHERESalaryAmount IS NULL

)

,

cte2 AS -- get RowId for each NULL value from previous cte with values for SalaryAmount NOT NULL

(

SELECT cte.RowID,cte.EmployeeID,MAX(#SalaryHistory.rowid) AS s

FROM #SalaryHistory

INNER JOIN cte ON cte.EmployeeID=#SalaryHistory.EmployeeID

WHERE #SalaryHistory.SalaryAmount IS NOT NULL

AND #SalaryHistory.RowID<cte.RowID

GROUP BY cte.RowID,cte.EmployeeID

)

,

cte3 AS -- assign SalaryAmount to the RowId's found in cte2

(

SELECT cte2.rowid,sh.salaryamount

FROM cte2

INNER JOIN #SalaryHistory sh ON cte2.s=sh.rowid

)

--UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount

SELECT cte3.*

FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid

SELECT * FROM #SalaryHistory ORDER BY EmployeeID, SalaryDate

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Ron McCullough SSC Guru Points: 63877 More actions · Answer 2

lmu92 Your code:

UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount

SELECT cte3.*

FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid

Your code kept giving me an error The multi-part identifier "cte3.salaryamount" could not be bound after uncommenting the UPDATE statement

Looks like your cut and paste sort goofed up a slight amount. Had to add a bit of a statement to get it to work -- or am I doing something wrong?

UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount

--=== added code ===

FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid

--===End addition

SELECT cte3.*

FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

LutzM SSC Guru Points: 107049 More actions · Answer 3

@Bitbucket:

My code snippet has the following lines at then end:

--UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount

SELECT cte3.*

FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid

The UPDATE and the SELECT part are alternatives. The way you changed it would leave the UPDATE without any referenced table causing the error.

If you want to run the UPDATE you need to uncomment the UPDATE part and comment out the SELECT part:

UPDATE #SalaryHistory SET #SalaryHistory.salaryamount = cte3.salaryamount

--SELECT cte3.*

FROM cte3 INNER JOIN #SalaryHistory ON cte3.rowid=#SalaryHistory.rowid

Seems like my description ("change the SELECT line to the UPDATE line") wasn't clear enough... I'm sorry!

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Ron McCullough SSC Guru Points: 63877 More actions · Answer 4

Imu92

My apologies ... for not being more careful .... anyway I hope the end result is that to poster of the question will understand it better than I did

Again except my apologies for misunderstanding

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

LutzM SSC Guru Points: 107049 More actions · Answer 5

There's no reason to apologize, Ron!!

If there's anybody that made a mistake than it will be me, since I didn't describe the usage of my script well enough so I confused you!

Fortunately, you brought up the issue so I could clarify it and the OP and/or others will (hopefully) benefit from it. Thank you again!

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 6

Do it all at once and without a triangular join...

CREATE CLUSTERED INDEX IX_SalaryHistory_RowID

ON #SalaryHistory (RowID)

DECLARE @PrevSalaryAmount MONEY

UPDATE #SalaryHistory

SET @PrevSalaryAmount = SalaryAmount = CASE WHEN SalaryAmount >= 0 THEN SalaryAmount ELSE @PrevSalaryAmount END

FROM #SalaryHistory WITH(TABLOCKX)

OPTION (MAXDOP 1)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

LutzM SSC Guru Points: 107049 More actions · Answer 7

@andrew:

If you consider using Jeffs solution (which performs much better than mine) I think it's mandatory to read the Running Total / quirky update article[/url] including the discussion.

I think it's really important to understand how it works and why Jeff posted the code the way he did (including clustered index, table lock and MAXDOP option).

Jeff Moden (2/14/2010)
Do it all at once and without a triangular join...
CREATE CLUSTERED INDEX IX_SalaryHistory_RowID
ON #SalaryHistory (RowID)
DECLARE @PrevSalaryAmount MONEY
UPDATE #SalaryHistory
SET @PrevSalaryAmount = SalaryAmount = CASE WHEN SalaryAmount >= 0 THEN SalaryAmount ELSE @PrevSalaryAmount END
FROM #SalaryHistory WITH(TABLOCKX)
OPTION (MAXDOP 1)

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

LutzM SSC Guru Points: 107049 More actions · Answer 8

@ Jeff: at a second thought I kinda disagree with your clustered index...

Shouldn't it be on (EmployeeID, SalaryDate) instead of RowID?

What would happen with rows like:

SELECT 1, 12, '20050101', 50000 UNION ALL

SELECT 2, 12, '20050720', 52010 UNION ALL

SELECT 3, 12, '20070201', null UNION ALL

SELECT 4, 12, '20080510', 53100 UNION ALL

SELECT 5, 12, '20080612', null UNION ALL

SELECT 6, 34, '20050401', 47000 UNION ALL

SELECT 7, 12, '20090301', null UNION ALL

SELECT 8, 12, '20090405', 56230 UNION ALL

SELECT 9, 12, '20090105', null

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 9

lmu92 (2/14/2010)
@ Jeff: at a second thought I kinda disagree with your clustered index...
Shouldn't it be on (EmployeeID, SalaryDate) instead of RowID?
What would happen with rows like:
SELECT 1, 12, '20050101', 50000 UNION ALL
SELECT 2, 12, '20050720', 52010 UNION ALL
SELECT 3, 12, '20070201', null UNION ALL
SELECT 4, 12, '20080510', 53100 UNION ALL
SELECT 5, 12, '20080612', null UNION ALL
SELECT 6, 34, '20050401', 47000 UNION ALL
SELECT 7, 12, '20090301', null UNION ALL
SELECT 8, 12, '20090405', 56230 UNION ALL
SELECT 9, 12, '20090105', null

I used the RowID based on the fact that it's a temp table and it appears to be correctly sorted by EmployeeID and SalaryDate... but, of course, your suggestion is much more correct.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

AndyAus SSC Enthusiast Points: 100 More actions · Answer 10

AndyAus

SSC Enthusiast

Points: 100

February 15, 2010 at 12:29 am

#1118683

Thanks for the advice. It's been really helpful.

benkoskysa SSC Enthusiast Points: 149 More actions · Answer 11

RBAR comes through again, good work.

Requirement added: However, I don’t want it to overwrite a null salary amount for the next employee if they happen to have a null at the start of their salary history.

CREATE TABLE #SalaryHistory

(RowID int, EmployeeID int, SalaryDate datetime, SalaryAmount money)

INSERT INTO #SalaryHistory (RowID, EmployeeID, SalaryDate, SalaryAmount)

SELECT 1, 12, '20050101', null UNION ALL

SELECT 2, 12, '20050720', 52010 UNION ALL

SELECT 3, 12, '20070201', null UNION ALL

SELECT 4, 12, '20080510', 53100 UNION ALL

SELECT 5, 12, '20080612', null UNION ALL

SELECT 6, 12, '20090105', null UNION ALL

SELECT 7, 12, '20090301', null UNION ALL

SELECT 8, 12, '20090405', 56230 UNION ALL

SELECT 9, 34, '20050401', 47000 UNION ALL

SELECT 10, 52, '20050101', null UNION ALL

SELECT 11, 52, '20050720', 52010 UNION ALL

SELECT 12, 52, '20070201', null UNION ALL

SELECT 13, 52, '20080510', 53100 UNION ALL

SELECT 14, 52, '20080612', null UNION ALL

SELECT 15, 52, '20090105', null UNION ALL

SELECT 16, 52, '20090301', null UNION ALL

SELECT 17, 52, '20090405', 56230 UNION ALL

SELECT 18, 54, '20050401', 47000

SELECT * FROM #SalaryHistory;

CREATE CLUSTERED INDEX IX_SalaryHistory_RowID

ON #SalaryHistory (EmployeeID, SalaryDate)

DECLARE @PrevSalaryAmount MONEY, @PrevEmployeeID int

SET @PrevSalaryAmount = NULL SET @PrevEmployeeID = 0

UPDATE #SalaryHistory

SET @PrevSalaryAmount = SalaryAmount =

CASE WHEN SalaryAmount IS NULL AND EmployeeID = @PrevEmployeeID

THEN @PrevSalaryAmount

ELSE SalaryAmount

END,

@PrevEmployeeID = EmployeeID

FROM #SalaryHistory WITH(TABLOCKX)

OPTION (MAXDOP 1);

SELECT * FROM #SalaryHistory;

😉