How to replace a cell value with a cell value in previous row

Question

How to replace a cell value with a cell value in previous row

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004439 More actions · Answer 1

RBarryYoung (1/24/2009)
Just to clarify my earlier response to Hemin's question: I was assuming that he was asking about using SELECT pseudocursors to do this. There are of course lots of ways to accomplish the same result, many of which use SELECT at one point or another.

Yep, I agree...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RBarryYoung SSC Guru Points: 143329 More actions · Answer 2

OK, I decided that I wanted to test this with a substantial sized dataset. Here is the script to generate apprx 1 million rows:

Drop Table TestPCursors

go

CREATE TABLE TestPCursors

(OrderNo INT NOT NULL, SeqNo INT NOT NULL, Type CHAR(2))

ALTER TABLE TestPCursors

ADD PRIMARY KEY CLUSTERED (OrderNo, SeqNo)

go

;With cteNumbers as (Select TOP 300

ROW_NUMBER() Over(Order by object_id) as Num

From master.sys.system_columns)

INSERT into TestPCursors

Select (Num+OBJECT_ID) as OrderNo

, column_id

MAX((Num+Object_Id+column_id)%7)

, MAX(Case When column_id=1 then Left(name,2)

When (Object_Id+column_id)%5=0

Then Left(name,2)

When ((Num+Object_Id+column_id)%7)=0 Then Left(name,2)

Else '' End)

From master.sys.system_columns

Cross Join cteNumbers

Where (Num+Object_Id) NOT IN(Select object_id From master.sys.system_columns)

Group By (Num+OBJECT_ID), column_id

This takes a couple of minutes to load the data.

I ran Jeff's code against this and it took 7 seconds.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 3

So here's my code using the SELECT pseudocursor hot-wired to act like an UPDATE pseudocursor, plus every trick I could think of to remedy the problem of SQL string aggregation:

Set NoCount ON

Select convert(varchar(24), GETDATE(), 121)

--======

DECLARE @PrevType CHAR(2)

Declare @buffer varbinary(MAX)

Select @buffer = Cast('' as varbinary)

Declare @b6 varbinary(2000)

, @b8 varbinary(8000)

, @b10 varbinary(MAX)

, @b12 varbinary(MAX)

, @b14 varbinary(MAX)

, @b16 varbinary(MAX)

, @b18 varbinary(MAX)

, @bFinal varbinary(MAX)

Select @b6 =Cast('' as varbinary)

, @b8 =Cast('' as varbinary)

, @b10 =Cast('' as varbinary)

, @b12 =Cast('' as varbinary)

, @b14 =Cast('' as varbinary)

, @b16 =Cast('' as varbinary)

, @b18 =Cast('' as varbinary)

, @bFinal =Cast('' as varbinary)

Select TOP 1000000

@buffer = @buffer + Cast(OrderNo as varbinary(4))

+ Cast(SeqNo as varbinary(4))

+ Cast(Case When [Type]=' ' Then @PrevType Else [Type] End as BINARY(2))

, @PrevType = CASE When [Type]=' ' Then @PrevType Else [Type] End

, @b6 = CASE When RowNum%64=0 Then @b6 + @buffer Else @b6 End

, @buffer = CASE When RowNum%64=0 Then Cast('' as binary(2)) Else @buffer End

, @b8 = CASE When RowNum%256=0 Then @b8 + @b6 Else @b8 End

, @b6 = CASE When RowNum%256=0 Then Cast('' as varbinary) Else @b6 End

, @b10 = CASE When RowNum%1024=0 Then @b10 + @b8 Else @b10 End

, @b8 = CASE When RowNum%1024=0 Then Cast('' as varbinary) Else @b8 End

, @b12 = CASE When RowNum%4096=0 Then @b12 + @b10 Else @b12 End

, @b10 = CASE When RowNum%4096=0 Then Cast('' as varbinary) Else @b10 End

, @b14 = CASE When RowNum%16384=0 Then @b14 + @b12 Else @b14 End

, @b12 = CASE When RowNum%16384=0 Then Cast('' as varbinary) Else @b12 End

, @b16 = CASE When RowNum%65768=0 Then @b16 + @b14 Else @b16 End

, @b14 = CASE When RowNum%65768=0 Then Cast('' as varbinary) Else @b14 End

, @b18 = CASE When RowNum%263072=0 Then @b18 + @b16 Else @b18 End

, @b16 = CASE When RowNum%263072=0 Then Cast('' as varbinary) Else @b16 End

From (Select *, (ROW_NUMBER() Over(Order By OrderNo, SeqNo)) as RowNum

From TestPCursors) A

Order By OrderNo, SeqNo

Select @bFinal =@b18 + @b16 + @b14 + @b12 + @b10 + @b8 + @b6 + @buffer

--======

Select convert(varchar(24), GETDATE(), 121)

;With cteNumbers as (Select TOP 10000--00

ROW_NUMBER() Over(Order By c1.object_id) as RowNum

From master.sys.system_columns c1, master.sys.system_columns c2)

Select Cast(SUBSTRING(@bFinal, (RowNum-1)*10+1, 4) as int) as OrderNo

, Cast(SUBSTRING(@bFinal, (RowNum-1)*10+5, 4) as int) as SeqNo

, Cast(SUBSTRING(@bFinal, (RowNum-1)*10+9, 2) as CHAR(2)) as [Type]

Into #tempOut --dump into temp table, to factor out display timing differences

From cteNumbers

--======

Select convert(varchar(24), GETDATE(), 121)

Drop table #tempOut

This code runs in 10-11 seconds on my system, which I think is about as fast as this trick can be made to run.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

hemin.shah85 Mr or Mrs. 500 Points: 557 More actions · Answer 4

Thank you so much Jeff and Young. I guess as a newbie its worth it for me to go for a TempTable, it works for me. And by the way Jeff I do read that article, its awesome. I guess it will take a little while to shift higher gear.:)

Garadin One Orange Chip Points: 29613 More actions · Answer 5

RBarryYoung (1/24/2009)
So here's my code using the SELECT pseudocursor hot-wired to act like an UPDATE pseudocursor, plus every trick I could think of to remedy the problem of SQL string aggregation:
Set NoCount ON
Select convert(varchar(24), GETDATE(), 121)
--======
DECLARE @PrevType CHAR(2)
Declare @buffer varbinary(MAX)
Select @buffer = Cast('' as varbinary)
Declare @b6 varbinary(2000)
, @b8 varbinary(8000)
, @b10 varbinary(MAX)
, @b12 varbinary(MAX)
, @b14 varbinary(MAX)
, @b16 varbinary(MAX)
, @b18 varbinary(MAX)
, @bFinal varbinary(MAX)
Select @b6 =Cast('' as varbinary)
, @b8 =Cast('' as varbinary)
, @b10 =Cast('' as varbinary)
, @b12 =Cast('' as varbinary)
, @b14 =Cast('' as varbinary)
, @b16 =Cast('' as varbinary)
, @b18 =Cast('' as varbinary)
, @bFinal =Cast('' as varbinary)
Select TOP 1000000
@buffer = @buffer + Cast(OrderNo as varbinary(4))
+ Cast(SeqNo as varbinary(4))
+ Cast(Case When [Type]=' ' Then @PrevType Else [Type] End as BINARY(2))
, @PrevType = CASE When [Type]=' ' Then @PrevType Else [Type] End
, @b6 = CASE When RowNum%64=0 Then @b6 + @buffer Else @b6 End
, @buffer = CASE When RowNum%64=0 Then Cast('' as binary(2)) Else @buffer End
, @b8 = CASE When RowNum%256=0 Then @b8 + @b6 Else @b8 End
, @b6 = CASE When RowNum%256=0 Then Cast('' as varbinary) Else @b6 End
, @b10 = CASE When RowNum%1024=0 Then @b10 + @b8 Else @b10 End
, @b8 = CASE When RowNum%1024=0 Then Cast('' as varbinary) Else @b8 End
, @b12 = CASE When RowNum%4096=0 Then @b12 + @b10 Else @b12 End
, @b10 = CASE When RowNum%4096=0 Then Cast('' as varbinary) Else @b10 End
, @b14 = CASE When RowNum%16384=0 Then @b14 + @b12 Else @b14 End
, @b12 = CASE When RowNum%16384=0 Then Cast('' as varbinary) Else @b12 End
, @b16 = CASE When RowNum%65768=0 Then @b16 + @b14 Else @b16 End
, @b14 = CASE When RowNum%65768=0 Then Cast('' as varbinary) Else @b14 End
, @b18 = CASE When RowNum%263072=0 Then @b18 + @b16 Else @b18 End
, @b16 = CASE When RowNum%263072=0 Then Cast('' as varbinary) Else @b16 End
From (Select *, (ROW_NUMBER() Over(Order By OrderNo, SeqNo)) as RowNum
From TestPCursors) A
Order By OrderNo, SeqNo
Select @bFinal =@b18 + @b16 + @b14 + @b12 + @b10 + @b8 + @b6 + @buffer
--======
Select convert(varchar(24), GETDATE(), 121)
;With cteNumbers as (Select TOP 10000--00
ROW_NUMBER() Over(Order By c1.object_id) as RowNum
From master.sys.system_columns c1, master.sys.system_columns c2)
Select Cast(SUBSTRING(@bFinal, (RowNum-1)*10+1, 4) as int) as OrderNo
, Cast(SUBSTRING(@bFinal, (RowNum-1)*10+5, 4) as int) as SeqNo
, Cast(SUBSTRING(@bFinal, (RowNum-1)*10+9, 2) as CHAR(2)) as [Type]
Into #tempOut --dump into temp table, to factor out display timing differences
From cteNumbers
--======
Select convert(varchar(24), GETDATE(), 121)
Drop table #tempOut

This code runs in 10-11 seconds on my system, which I think is about as fast as this trick can be made to run.

OK, I'll bite. What the heck is goin on there? I've looked at it a few times, and it's not making any more sense than the first time I read it. Tried to run it and see, but get errors creating the sample data table. I'm sure it's quite the trick, and definitely not one I've seen before.

(Msg 102, Level 15, State 1, Line 7

Incorrect syntax near 'Num'.

Msg 156, Level 15, State 1, Line 16

Incorrect syntax near the keyword 'Group'.)

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 6

Garadin (1/25/2009)
OK, I'll bite. What the heck is goin on there? I've looked at it a few times, and it's not making any more sense than the first time I read it. Tried to run it and see, but get errors creating the sample data table. I'm sure it's quite the trick, and definitely not one I've seen before.
(Msg 102, Level 15, State 1, Line 7
Incorrect syntax near 'Num'.
Msg 156, Level 15, State 1, Line 16
Incorrect syntax near the keyword 'Group'.)

Arrgh! My bad, sorry. I've been writing my queries on SQL 2008 lately to get more used to it and I forgot to test it on SQL 2005. Hang on, I'll get a corrected version...

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 7

Actually, I think that it was just a typo from a left over diagnostic column.

Anyway, here is a slightly reqwritten version of the Load script:

Drop Table TestPCursors

go

CREATE TABLE TestPCursors

(OrderNo INT NOT NULL, SeqNo INT NOT NULL, Type CHAR(2))

ALTER TABLE TestPCursors

ADD PRIMARY KEY CLUSTERED (OrderNo, SeqNo)

go

;With cteNumbers as (Select TOP 300

ROW_NUMBER() Over(Order by object_id) as Num

From master.sys.system_columns)

, cteCol300 as (Select Object_id

, Num

, column_id

, name

, Num+object_id as OrderNo

, Num+object_id+column_id as ShiftStart1

, object_id+column_id as ShiftStart2

From master.sys.system_columns

Cross Join cteNumbers)

INSERT into TestPCursors

Select OrderNo

, column_id

, MAX(Case When column_id=1 Then Left(name,2)

When ShiftStart2%5=0 Then Left(name,2)

When (ShiftStart1%7)=0 Then Left(name,2)

Else '' End)

From cteCol300

Group By OrderNo, column_id

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 8

OK, I checked my solution code on 2005, and it works there too In fact its even faster, 8.5 seconds.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 9

Here is the vastly simpler non-optimized version:

DECLARE @PrevType CHAR(2)

Declare @buffer varchar(MAX)

Select @buffer = ''

--====== Load the buffer with all of the output values

Select TOP 10000--00

@buffer = @buffer + Cast(OrderNo as CHAR(11))

+ Cast(SeqNo as CHAR(3))

+ Cast(Case When [Type]=' ' Then @PrevType Else [Type] End as CHAR(2))

, @PrevType = CASE When [Type]=' ' Then @PrevType Else [Type] End

From (Select *, (ROW_NUMBER() Over(Order By OrderNo, SeqNo)) as RowNum

From TestPCursors) A

Order By OrderNo, SeqNo

--====== Now extract and display the output values for every row

;With cteNumbers as (Select TOP 10000--00

ROW_NUMBER() Over(Order By c1.object_id) as RowNum

From master.sys.system_columns c1, master.sys.system_columns c2)

Select Cast(SUBSTRING(@buffer, (RowNum-1)*16+1, 11) as int) as OrderNo

, Cast(SUBSTRING(@buffer, (RowNum-1)*16+12, 3) as int) as SeqNo

, Cast(SUBSTRING(@buffer, (RowNum-1)*16+15, 2) as CHAR(2)) as [Type]

From cteNumbers

Note that because it is so slow, i have restricted it to just 10,000 rows.

Logically, this is the same algorithm as the optimized version, so hopefully you can see where this one is going...

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 10

OK, how it works. First the unoptimized (simple) version:

The idea here is, just like an Update pseudocursor, use a Select pseudocursor to accumulate state across rows, to determine the handling of the output columns within each row. Two important differences, however. First, because SELECT cannot write to bot variables and output columns, I use a single string variable (@buffer) to hold all of my output columns and rows. The second difference is that because I am using ORDER BY on the outer select to control my order, this technique is completely supported.

Now, once this massive string (apprx. 16 MB) is created, then I use a Numbers/Tally table to enumerate all of the output rows that I want, and then extract the columns values from the giant string buffer, convert them to the correct data type and output them.

So, why is it so slow? Because simple/naive string accumulation is essentially a Triangular operation. Consider this:

row 001, add 'A': @b-2 = '' + 'A'; cost of this row is 1

row 002, add 'B': @b-2 = 'A' + 'B'; cost of this row is 2

row 003, add 'C': @b-2 = 'AB' + 'C'; cost of this row is 3

row 004, add 'D': @b-2 = 'ABC' + 'D'; cost of this row is 4

...

and the cost of the millionth rows is 1,000,000. Add up the cost of every row to get the total cost and you get 5,000,005,000,000, which is a very big number indeed (five trillion, five million). And this was the motivation for the much more complicated stuff that you see in the optimized version.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 11

The second performance problem that the unoptimized version has is that the second select is much slower than I thought it would be. Because I know that the conversion from character string to INT is a high overhead operation, I figured to circumvent this problem by using binary strings instead of character strings.

Converting to or from binary is really just a memcopy operation with no conditions or calculations necessary, with a different datatype applied to the output, so it should be much more efficient. And that is why the optimized version uses binary strings everywhere. This does help with the efficiency of that step, but not nearly as much as I had hoped.

And this still does nothing for the Triangular nature of string accumulation...

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Garadin One Orange Chip Points: 29613 More actions · Answer 12

I get it now. Great explanation. So you've also found that it's quicker to dump repeatedly into the slightly bigger varbinary's than just dumping into one big one whenever you pass a certain number?

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 13

Garadin (1/25/2009)
I get it now. Great explanation. So you've also found that it's quicker to dump repeatedly into the slightly bigger varbinary's than just dumping into one big one whenever you pass a certain number?

Exactly Seth. Consider the following simplified examples, first unoptimized:

Row Step Row Step Total

Cost Cost Cost

1 a 1 1 @b-2 =''+'A' 1

2 a 2 2 @b-2='A'+'B' 3

3 a 3 3 @b-2='AB'+'C' 6

4 a 4 4 @b-2='ABC'+'D' 10

5 a 5 5 @b-2='ABCD'+'E' 15

6 a 6 6 @b-2='ABCDE'+'F' 21

7 a 7 7 @b-2='ABCDEF'+'G' 28

8 a 8 8 @b-2='ABCDEFG'+'H' 36

Now a slightly optimized version:

Row Step Row Step Total

Cost Cost Cost

1 a 1 1 @b-2 =''+'A' 1

2 a 2 2 @b-2='A'+'B' 3

3 a 3 3 @b-2='AB'+'C' 6

4 a 4 @b-2='ABC'+'D'

b 4 @b2=''+'ABCD'

c 8 0 @b-2='' 14

5 a 1 1 @b-2=''+'E' 15

6 a 2 2 @b-2='E'+'F' 17

7 a 3 3 @b-2='EF'+'G' 20

8 a 4 4 @b-2='EFG'+'H'

b 8 @b2='ABCD'+'EFGH'

c 12 0 @b-2='' 32

Now this improvement may seem small (10%), but this is greatly magnified by larger numbers, and then multiplied by stacking them up. The optimizations that I implemented for this improved the overall string performance by 100 to 1000x.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 14

I should also add, that my previous numbers were incorrect (left a debugging limiter in), the total time is 30 seconds, compared to Jeffs 7+ seconds.

Only 8 seconds are in the string accumulator, the other 22 seconds are (suprisingly) in the row enumerator SELECT. Using varbinary helped some, but not nearly enough and at this point, I cannot figure out what is taking it so long.

Any ideas, Jeff?

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Garadin One Orange Chip Points: 29613 More actions · Answer 15

This is basically gap finding, and it seems like you should be able to use a variation of Jeff's identity gap finding technique for this. I've been trying to adapt it to fit this, but keep having to make too many references to the table to do it. I feel like I'm overlooking something basic.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]