June 17, 2010 at 2:13 am
I need to replace some values in a column (varchar(max)) and I want to avoid cursors.
I have a table which has an identifier and a text, and then another one which has the ID so it can be linked to the first table and a pair of old values and new values.
I need to replace in the text column of the first table all the old values with the new values.
This is easy using a cursor, but I need to avoid it and I'm trying to do it in a single update.
IF OBJECT_ID('TEMPDB..#FINAL') IS NOT NULL
DROP TABLE #FINAL
IF OBJECT_ID('TEMPDB..#VAL') IS NOT NULL
DROP TABLE #VAL
GO
CREATE TABLE #FINAL (
ID INT NOT NULL,
VAL VARCHAR(MAX) NULL)
CREATE TABLE #VAL (
ID INT NOT NULL,
OLD_VAL VARCHAR(30) NOT NULL,
NEW_VAL VARCHAR(30) NULL
)
CREATE UNIQUE CLUSTERED INDEX #IX_FINAL ON #FINAL(ID)
CREATE UNIQUE CLUSTERED INDEX #IX_VAL ON #VAL(ID, OLD_VAL)
INSERT INTO #FINAL VALUES(1, 'Some text AAA Another text BBB some more text')
INSERT INTO #FINAL VALUES(2, 'Some text CCC Another text DDD some more text')
INSERT INTO #VAL VALUES(1, 'AAA', 'NewAAA')
INSERT INTO #VAL VALUES(1, 'BBB', 'NewBBB')
INSERT INTO #VAL VALUES(2, 'CCC', 'NewCCC')
INSERT INTO #VAL VALUES(2, 'DDD', 'NewDDD')
DROP TABLE #FINAL
DROP TABLE #VAL
The expected result would be
ID VAL
1 Some text NewAAA Some Text NewBBB
2 Some text NewCCC Some Text NewDDD
Of course, my first attempt was a failure π
UPDATE F SET F.VAL=REPLACE(F.VAL, V.OLD_VAL, V.NEW_VAL)
FROM #VAL V WITH (INDEX(#IX_VAL), TABLOCKX)
, #FINAL F WITH (INDEX(#IX_FINAL), TABLOCKX)
WHERE F.ID=V.ID
produced (as expected)
ID VAL
1 Some text NewAAA Some Text BBB
2 Some text NewCCC Some Text DDD
Then I tried
DECLARE @val VARCHAR(MAX), @I INT
SET @I=0
SET @val=''
UPDATE F SET @val=F.VAL=REPLACE(CASE WHEN V.ID=@I THEN @val ELSE F.VAL END, V.OLD_VAL, V.NEW_VAL)
, @I=V.ID
FROM #VAL V
, #FINAL F
WHERE F.ID=V.ID
But the result was the same.
Looking at the Execution plan I see
1. a Clustered Index Scan on #VAL which returns 4 rows - so far so good π
2. a Stream Aggregate (Aggregate) operation which groups the 4 returned rows by ID, and I end up with only 2 rows.
Any suggestions on how can I avoid that Stream Aggregate, or any other grouping operation?
Thank you,
Daniel
June 17, 2010 at 2:31 am
Hi Daniel
In your sample data, you have two rows in table #VAL for each row in your update target table. Is this always the case? What's the maximum number of rows in #VAL sharing the same ID?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 17, 2010 at 2:40 am
Yes, for the same ID there can be multiple pairs of old value/new value.
As for the maximum number of pairs, ... usually there are 3 or 4 pairs.
June 17, 2010 at 2:57 am
Here's why I asked - a simple and fast way to get your results:
;WITH Sourcedata AS (
SELECT SubID = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY OLD_VAL),
ID, OLD_VAL, NEW_VAL
FROM #VAL
)
SELECT t.*, s1.*, s2.*,
AmendedTarget = REPLACE(REPLACE(T.VAL, s1.OLD_VAL, s1.NEW_VAL), s2.OLD_VAL, s2.NEW_VAL)
FROM #FINAL t
INNER JOIN Sourcedata s1 ON s1.ID = t.ID AND s1.SubID = 1
LEFT JOIN Sourcedata s2 ON s2.ID = t.ID AND s2.SubID = 2
-- account for more rows per ID
This will work fine with only three or four rows of changes per target.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 17, 2010 at 4:18 am
Chris, I see what you mean.
I'm not sure it works because the key word in my last post was usually. π
There can always be more than 4 pairs for the same ID and the cod will become really ugly.
But, this points me to another direction. Not quite one update, but at least it does not have cursors. And yes, I know it's a RBAR, but this is what I'm looking for
-- Get them all into one table
SELECT V.ID, V.OLD_VAL, V.NEW_VAL, F.VAL
INTO #TEMP
FROM #VAL V
JOIN #FINAL F ON F.ID=V.ID
ORDER BY V.ID, V.OLD_VAL -- order is important
DECLARE @val VARCHAR(MAX), @ID INT, @OLD_VAL VARCHAR(30)
SET @ID=0
SET @val=''
--CREATE CLUSTERED INDEX #IX_TEMP ON TEMP(ID, OLD_VAL) -- not sure if this is really needed
/*the idea is to update column VAL on each row with the new value and carry the
updated text to the next row, something like a running total.
Thanks to Jeff Moden http://www.sqlservercentral.com/articles/T-SQL/68467/
*/
UPDATE T SET
@val=T.VAL = REPLACE(CASE WHEN @ID<>ID THEN T.VAL ELSE @val END, T.OLD_VAL, T.NEW_VAL)
, @ID=ID --, @OLD_VAL= OLD_VAL
FROM #TEMP T WITH (INDEX(0), TABLOCKX) --index(0) forces a table scan
-- now Temp table has the desired text in the row with OLD_VAL = max(OLD_VAL)
UPDATE F SET F.VAL=T.VAL
FROM #FINAL F
JOIN (SELECT ID, MAX(OLD_VAL) AS OLD_VAL FROM #TEMP GROUP BY ID) X ON F.ID=X.ID
JOIN #TEMP T ON F.ID=T.ID AND X.OLD_VAL=T.OLD_VAL
And now, if anybody can improve this, I'll be happy to try it.
June 18, 2010 at 9:28 am
I would go at this from another direction:
Dynamically generate an UPDATE statement that does all the REPLACEs (or one UPDATE for each set of rows if there are variations for different blocks of rows).
For example:
DECLARE @sql1 varchar(max)
DECLARE @sql2 varchar(max)
DECLARE @newValCount int
SELECT @newValCount = COUNT(*)
FROM #VAL
SET @sql1 = 'UPDATE #FINAL SET VAL = ' +
REPLICATE('REPLACE(', @newValCount) + 'val'
SET @sql2 = ''
--you can replace this with a cursor if you're worried about errors w/ this method
SELECT @sql2 = @sql2 +
',''' + old_val + ''', ''' + new_val + ''')'
FROM #VAL
ORDER BY ID
PRINT @sql1 + @sql2
EXEC(@sql1 + @sql2)
SELECT *
FROM #FINAL
Scott Pletcher, SQL Server MVP 2008-2010
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply