Multiple updates on the same column

Question

Multiple updates on the same column

MeThinks

SSC Eights!

Points: 851
More actions
June 17, 2010 at 2:13 am

#236602

I need to replace some values in a column (varchar(max)) and I want to avoid cursors.
I have a table which has an identifier and a text, and then another one which has the ID so it can be linked to the first table and a pair of old values and new values.
I need to replace in the text column of the first table all the old values with the new values.
This is easy using a cursor, but I need to avoid it and I'm trying to do it in a single update.
IF OBJECT_ID('TEMPDB..#FINAL') IS NOT NULL
DROP TABLE #FINAL
IF OBJECT_ID('TEMPDB..#VAL') IS NOT NULL
DROP TABLE #VAL
GO
CREATE TABLE #FINAL (
ID INT NOT NULL,
VAL VARCHAR(MAX) NULL)
CREATE TABLE #VAL (
ID INT NOT NULL,
OLD_VAL VARCHAR(30) NOT NULL,
NEW_VAL VARCHAR(30) NULL
)
CREATE UNIQUE CLUSTERED INDEX #IX_FINAL ON #FINAL(ID)
CREATE UNIQUE CLUSTERED INDEX #IX_VAL ON #VAL(ID, OLD_VAL)
INSERT INTO #FINAL VALUES(1, 'Some text AAA Another text BBB some more text')
INSERT INTO #FINAL VALUES(2, 'Some text CCC Another text DDD some more text')
INSERT INTO #VAL VALUES(1, 'AAA', 'NewAAA')
INSERT INTO #VAL VALUES(1, 'BBB', 'NewBBB')
INSERT INTO #VAL VALUES(2, 'CCC', 'NewCCC')
INSERT INTO #VAL VALUES(2, 'DDD', 'NewDDD')
DROP TABLE #FINAL
DROP TABLE #VAL
The expected result would be
ID VAL
1 Some text NewAAA Some Text NewBBB
2 Some text NewCCC Some Text NewDDD
Of course, my first attempt was a failure 🙂
UPDATE F SET F.VAL=REPLACE(F.VAL, V.OLD_VAL, V.NEW_VAL)
FROM #VAL V WITH (INDEX(#IX_VAL), TABLOCKX)
, #FINAL F WITH (INDEX(#IX_FINAL), TABLOCKX)
WHERE F.ID=V.ID
produced (as expected)
ID VAL
1 Some text NewAAA Some Text BBB
2 Some text NewCCC Some Text DDD
Then I tried
DECLARE @VAL VARCHAR(MAX), @I INT
SET @I=0
SET @VAL=''
UPDATE F SET @VAL=F.VAL=REPLACE(CASE WHEN V.ID=@I THEN @VAL ELSE F.VAL END, V.OLD_VAL, V.NEW_VAL)
, @I=V.ID
FROM #VAL V
, #FINAL F
WHERE F.ID=V.ID
But the result was the same.
Looking at the Execution plan I see
1. a Clustered Index Scan on #VAL which returns 4 rows - so far so good 🙂
2. a Stream Aggregate (Aggregate) operation which groups the 4 returned rows by ID, and I end up with only 2 rows.
Any suggestions on how can I avoid that Stream Aggregate, or any other grouping operation?
Thank you,
Daniel

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

Hi Daniel

In your sample data, you have two rows in table #VAL for each row in your update target table. Is this always the case? What's the maximum number of rows in #VAL sharing the same ID?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

MeThinks SSC Eights! Points: 851 More actions · Answer 2

Yes, for the same ID there can be multiple pairs of old value/new value.

As for the maximum number of pairs, ... usually there are 3 or 4 pairs.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

Here's why I asked - a simple and fast way to get your results:

;WITH Sourcedata AS (

SELECT SubID = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY OLD_VAL),

ID, OLD_VAL, NEW_VAL

FROM #VAL

)

SELECT t.*, s1.*, s2.*,

AmendedTarget = REPLACE(REPLACE(T.VAL, s1.OLD_VAL, s1.NEW_VAL), s2.OLD_VAL, s2.NEW_VAL)

FROM #FINAL t

INNER JOIN Sourcedata s1 ON s1.ID = t.ID AND s1.SubID = 1

LEFT JOIN Sourcedata s2 ON s2.ID = t.ID AND s2.SubID = 2

-- account for more rows per ID

This will work fine with only three or four rows of changes per target.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

MeThinks SSC Eights! Points: 851 More actions · Answer 4

Chris, I see what you mean.

I'm not sure it works because the key word in my last post was usually. 🙂

There can always be more than 4 pairs for the same ID and the cod will become really ugly.

But, this points me to another direction. Not quite one update, but at least it does not have cursors. And yes, I know it's a RBAR, but this is what I'm looking for

-- Get them all into one table

SELECT V.ID, V.OLD_VAL, V.NEW_VAL, F.VAL

INTO #TEMP

FROM #VAL V

JOIN #FINAL F ON F.ID=V.ID

ORDER BY V.ID, V.OLD_VAL -- order is important

DECLARE @VAL VARCHAR(MAX), @ID INT, @OLD_VAL VARCHAR(30)

SET @ID=0

SET @VAL=''

--CREATE CLUSTERED INDEX #IX_TEMP ON TEMP(ID, OLD_VAL) -- not sure if this is really needed

/*the idea is to update column VAL on each row with the new value and carry the

updated text to the next row, something like a running total.

Thanks to Jeff Moden http://www.sqlservercentral.com/articles/T-SQL/68467/

*/

UPDATE T SET

@VAL=T.VAL = REPLACE(CASE WHEN @ID<>ID THEN T.VAL ELSE @VAL END, T.OLD_VAL, T.NEW_VAL)

, @ID=ID --, @OLD_VAL= OLD_VAL

FROM #TEMP T WITH (INDEX(0), TABLOCKX) --index(0) forces a table scan

-- now Temp table has the desired text in the row with OLD_VAL = max(OLD_VAL)

UPDATE F SET F.VAL=T.VAL

FROM #FINAL F

JOIN (SELECT ID, MAX(OLD_VAL) AS OLD_VAL FROM #TEMP GROUP BY ID) X ON F.ID=X.ID

JOIN #TEMP T ON F.ID=T.ID AND X.OLD_VAL=T.OLD_VAL

And now, if anybody can improve this, I'll be happy to try it.

scott.pletcher SSCertifiable Points: 5958 More actions · Answer 5

I would go at this from another direction:

Dynamically generate an UPDATE statement that does all the REPLACEs (or one UPDATE for each set of rows if there are variations for different blocks of rows).

For example:

DECLARE @sql1 varchar(max)

DECLARE @sql2 varchar(max)

DECLARE @newValCount int

SELECT @newValCount = COUNT(*)

FROM #VAL

SET @sql1 = 'UPDATE #FINAL SET VAL = ' +

REPLICATE('REPLACE(', @newValCount) + 'val'

SET @sql2 = ''

--you can replace this with a cursor if you're worried about errors w/ this method

SELECT @sql2 = @sql2 +

',''' + old_val + ''', ''' + new_val + ''')'

FROM #VAL

ORDER BY ID

PRINT @sql1 + @sql2

EXEC(@sql1 + @sql2)

SELECT *

FROM #FINAL

Scott Pletcher, SQL Server MVP 2008-2010