While loop to replace text - how to re-write to use set processing

Question

While loop to replace text - how to re-write to use set processing

FridayNightGiant

SSCertifiable

Points: 7151
More actions
August 12, 2015 at 6:17 am

#320454

All,
I’ve created a script which will do the following.
Update fields in a database which contain instances of an order number. An order number is defined as a 10 digit numeric sting which beings with 998. The first 3 digits of the order number need to change from 998 to 999. There are two types of fields to update: document number fields which may contain 1 instance of the order number and free text fields which may contain multiple instances of an order number.
I created a function which accepts the text to be updated, the text to find and the text to replace it with. The function then loops through the sting for instances of 998. For each instance it finds it checks to ensure that it is at the start of a 10 digit string which contains only numeric numbers – if this is the case then it will update the 998 to be 999.
If the field is free text then it will continue to loop through the string (it will skip forward 10 digits for every order number found) until the end. If the field is not free text then it will exit the script after the first instance found which matches the above criteria, if any.
I was wondering if someone could point in me in the right direction of achieving this via set processing and not having to loop through every line. I’ve included the function below and some test data.
Thanks in advance.
--Create the function we will call in order to do the replace
if object_id(N'OrderReplace',N'FN') is not null
drop function dbo.OrderReplace;
go
create function dbo.OrderReplace (
@Textnvarchar(4000),
@FindTextnvarchar(6),
@ReplaceTextnvarchar(6),
@IsFreeTextbit
)
returns nvarchar(4000)
as
begin
declare@FindLocation as int,
@TotalLen as int,
@FindLen as int,
@StartPosition as int = 1;
select @TotalLen = len(@Text);
select @FindLen = len(@FindText);
while (@StartPosition < @TotalLen) --while the starting position for the find is less than the total length
begin
select @FindLocation = charindex(@FindText, @Text, @StartPosition); --get the location of the first instance of FindText from the start position
if @FindLocation = 0 -- if nothing found then exit loop
break;
--ensure @FindText is the start of a 10 digit numberic string. This way we can be confident that its a maint order no.
if len(substring(@Text,@FindLocation, 10)) = 10 and isnumeric(substring(@Text,@FindLocation, 10)+'.e0') = 1 --add .e0 to stop isnumeric returning true if a comma or period exists in string already
begin
select @Text = stuff(@Text, @FindLocation,len(substring(@Text, @FindLocation, @TotalLen)), @ReplaceText + substring(@Text, @FindLocation+@FindLen, @TotalLen)); --remove 3 charaters from the string at the findlocation and replace with 181
if @IsFreeText = 1 --if it is a free text then need to loop through the string, otherwise break out of loop
select @StartPosition = @FindLocation + 9; -- set the new start position by using the find location and adding the number of charaters to skip.
else
break;
end
else
select @StartPosition = @FindLocation + @FindLen; -- set the new start position by using the find location and adding the number of charaters to skip.
end
return @Text;
end
go
with x as (
select '2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT
union
select '2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT
union
select '2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT
union
select '2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT
union
select '2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT
)
selectdbo.OrderReplace(DOCNUMBER, '998', '999', 0)
,dbo.OrderReplace(DOCFREETEXT, '998', '999',1)
fromx
Expected Output
DOCNUMBERDOCFREETEXT
2015/3352424Testing 9997998646
2015/33563/RA/9997998647 Testing 9997998647
2015/3356324Testing 9997895645 998
2015/33564/RA/9994564658 Testing 9994564658,9994564659,9994564660 998
2015/3999324Testing 9997895646

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Jayanth_Kurup SSC-Insane Points: 22967 More actions · Answer 1

In your expected output i see there is a comma delimiter between strings, is this the way the data is stored or added by you when posting the query?

if there is a comma delimited value between strings then a simple way to acheive the results is

-Split the entire set into two data sets

- Dataset1 contains rows with only a single value for order num ( then simply use the left function to replace as needed)

- Dataset2 contains rows with a comma delimited string for order numbers , here you can use the replace function (inclusive of the comma and 998 to be repalced with comma and 999)

Jayanth Kurup[/url]

FridayNightGiant SSCertifiable Points: 7151 More actions · Answer 2

Jayanth_Kurup (8/12/2015)
In your expected output i see there is a comma delimiter between strings, is this the way the data is stored or added by you when posting the query?
if there is a comma delimited value between strings then a simple way to acheive the results is
-Split the entire set into two data sets
- Dataset1 contains rows with only a single value for order num ( then simply use the left function to replace as needed)
- Dataset2 contains rows with a comma delimited string for order numbers , here you can use the replace function (inclusive of the comma and 998 to be repalced with comma and 999)

Thanks for the reply. No the data isn't comma delimited. This was an example of data in a free text field where a user can enter anything they want, they might enter a single order number and some text, they might enter text and multiple order numbers separated by a space,comma,dash etc. so there is no logic to how the order numbers will be in this field.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 3

Here's a possible option. Be sure to test what it does and understand it.

It basically splits the strings to find the different digit portions by using the PatternSplitCM which can be found here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/

It then joins the strings again using this technique: http://www.sqlservercentral.com/articles/comma+separated+list/71700/

create function dbo.itvf_OrderReplace (

@Textnvarchar(4000),

@FindTextnvarchar(6),

@ReplaceTextnvarchar(6),

@IsFreeTextbit

)

returns table

as

RETURN

SELECT (SELECT CASE WHEN Matched = 1 --Only digits

AND Item LIKE @FindText + '%' --Starts by FindText

AND LEN(Item) = 10 --10 digits

AND (ItemNumber = MIN(ItemNumber) OVER(PARTITION BY Matched) OR @IsFreeText = 1) --Change either the first or all

THEN STUFF( Item, 1, LEN(@FindText), @ReplaceText)

ELSE Item END

FROM dbo.PatternSplitCM( @Text, '%[0-9]%')

ORDER BY ItemNumber

FOR XML PATH(''), TYPE).value('.', 'nvarchar(4000)') AS newOrder

go

with x as (

select '2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT

union

select '2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT

union

select '2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT

union

select '2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT

union

select '2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT

)

selectDOCNUMBER

,dbo.OrderReplace(DOCNUMBER, '998', '999', 0)

,DN.newOrder

,DOCFREETEXT

,dbo.OrderReplace(DOCFREETEXT, '998', '999',1)

,DFT.newOrder

fromx

CROSS APPLY dbo.itvf_OrderReplace(DOCNUMBER, '998', '999', 0) DN

CROSS APPLY dbo.itvf_OrderReplace(DOCFREETEXT, '998', '999', 1) DFT

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

FridayNightGiant SSCertifiable Points: 7151 More actions · Answer 4

Thank you Luis. I will have a play about with this so I understand fully what it is doing but after a first test, on a million rows, using only the free text column (on some pretty decrepit hardware) your script took 17 seconds whereas my while loop took 1 min 7 seconds.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 5

You're welcome, feel free to ask any questions that you have.

I also created an initial version that lacked versatility and replaced only the first order number. It was like this:

ISNULL( STUFF( DOCNUMBER, PATINDEX('%[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', '-' + DOCNUMBER + '-'), 3, '999'), DOCNUMBER)

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

ScottPletcher SSC Guru Points: 101103 More actions · Answer 6

Here's a method without an external function, just out of curiosity mostly about how it would perform vs. other methods. It uses a "standard" tally table. This method does require a fixed number of possible substitutions: for now I used 6, but more could be added.

with x as (

select 1 as rec#,'2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT

union

select 2,'2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT

union

select 3,'2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT

union

select 4,'2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT

union

select 5,'2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT

)

select

isnull(docnumber1, docnumber) as docnumber,

coalesce(docfreetext6, docfreetext5, docfreetext4, docfreetext3, docfreetext2, docfreetext1) AS docfreetext

--,*

fromx

cross apply (

select PATINDEX('%[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', DOCNUMBER + '.') AS DOCNUMBER_START_OF_ORDER_NUMBER

) as ca_DOCNUMBER_START_OF_ORDER_NUMBER

cross apply (

select case when DOCNUMBER_START_OF_ORDER_NUMBER >= 1 then STUFF(docnumber, DOCNUMBER_START_OF_ORDER_NUMBER + 3, 1, '9') end AS DOCNUMBER1

) as ca_DOCNUMBER1

outer apply (

select top (1) tally AS tally1, STUFF(DOCFREETEXT, tally + 3, 1, '9') AS DOCFREETEXT1

from dbo.tally

where PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT, tally, 8000) + '.') > 0

) as ca_DOCFREETEXT1

outer apply (

select top (1) tally AS tally2, case when tally1 > 0 then STUFF(DOCFREETEXT1, tally + 3, 1, '9') end AS DOCFREETEXT2

from dbo.tally

where

tally >= tally1 + 10 and

PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT1, tally, 8000) + '.') > 0

) as ca_DOCFREETEXT2

outer apply (

select top (1) tally AS tally3, case when tally2 > 0 then STUFF(DOCFREETEXT2, tally + 3, 1, '9') end AS DOCFREETEXT3

from dbo.tally

where

tally >= tally2 + 10 and

PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT2, tally, 8000) + '.') > 0

) as ca_DOCFREETEXT3

outer apply (

select top (1) tally AS tally4, case when tally3 > 0 then STUFF(DOCFREETEXT3, tally + 3, 1, '9') end AS DOCFREETEXT4

from dbo.tally

where

tally >= tally3 + 10 and

PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT3, tally, 8000) + '.') > 0

) as ca_DOCFREETEXT4

outer apply (

select top (1) tally AS tally5, case when tally4 > 0 then STUFF(DOCFREETEXT4, tally + 3, 1, '9') end AS DOCFREETEXT5

from dbo.tally

where

tally >= tally4 + 10 and

PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT4, tally, 8000) + '.') > 0

) as ca_DOCFREETEXT5

outer apply (

select top (1) tally AS tally6, case when tally5 > 0 then STUFF(DOCFREETEXT5, tally + 3, 1, '9') end AS DOCFREETEXT6

from dbo.tally

where

tally >= tally5 + 10 and

PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT5, tally, 8000) + '.') > 0

) as ca_DOCFREETEXT6

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

FridayNightGiant SSCertifiable Points: 7151 More actions · Answer 7

ScottPletcher (8/12/2015)
Here's a method without an external function, just out of curiosity mostly about how it would perform vs. other methods. It uses a "standard" tally table. This method does require a fixed number of possible substitutions: for now I used 6, but more could be added.
with x as (
select 1 as rec#,'2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT
union
select 2,'2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT
union
select 3,'2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT
union
select 4,'2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT
union
select 5,'2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT
)
select
isnull(docnumber1, docnumber) as docnumber,
coalesce(docfreetext6, docfreetext5, docfreetext4, docfreetext3, docfreetext2, docfreetext1) AS docfreetext
--,*
fromx
cross apply (
select PATINDEX('%[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', DOCNUMBER + '.') AS DOCNUMBER_START_OF_ORDER_NUMBER
) as ca_DOCNUMBER_START_OF_ORDER_NUMBER
cross apply (
select case when DOCNUMBER_START_OF_ORDER_NUMBER >= 1 then STUFF(docnumber, DOCNUMBER_START_OF_ORDER_NUMBER + 3, 1, '9') end AS DOCNUMBER1
) as ca_DOCNUMBER1
outer apply (
select top (1) tally AS tally1, STUFF(DOCFREETEXT, tally + 3, 1, '9') AS DOCFREETEXT1
from dbo.tally
where PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT1
outer apply (
select top (1) tally AS tally2, case when tally1 > 0 then STUFF(DOCFREETEXT1, tally + 3, 1, '9') end AS DOCFREETEXT2
from dbo.tally
where
tally >= tally1 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT1, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT2
outer apply (
select top (1) tally AS tally3, case when tally2 > 0 then STUFF(DOCFREETEXT2, tally + 3, 1, '9') end AS DOCFREETEXT3
from dbo.tally
where
tally >= tally2 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT2, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT3
outer apply (
select top (1) tally AS tally4, case when tally3 > 0 then STUFF(DOCFREETEXT3, tally + 3, 1, '9') end AS DOCFREETEXT4
from dbo.tally
where
tally >= tally3 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT3, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT4
outer apply (
select top (1) tally AS tally5, case when tally4 > 0 then STUFF(DOCFREETEXT4, tally + 3, 1, '9') end AS DOCFREETEXT5
from dbo.tally
where
tally >= tally4 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT4, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT5
outer apply (
select top (1) tally AS tally6, case when tally5 > 0 then STUFF(DOCFREETEXT5, tally + 3, 1, '9') end AS DOCFREETEXT6
from dbo.tally
where
tally >= tally5 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT5, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT6

Hi Scott,

How many rows should be in the tally table? With a 100 row tally table it took nearly 4 minutes to process 500,000 records.

Thanks

ScottPletcher SSC Guru Points: 101103 More actions · Answer 8

FridayNightGiant (8/14/2015)
ScottPletcher (8/12/2015)
Here's a method without an external function, just out of curiosity mostly about how it would perform vs. other methods. It uses a "standard" tally table. This method does require a fixed number of possible substitutions: for now I used 6, but more could be added.
with x as (
select 1 as rec#,'2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT
union
select 2,'2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT
union
select 3,'2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT
union
select 4,'2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT
union
select 5,'2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT
)
select
isnull(docnumber1, docnumber) as docnumber,
coalesce(docfreetext6, docfreetext5, docfreetext4, docfreetext3, docfreetext2, docfreetext1) AS docfreetext
--,*
fromx
cross apply (
select PATINDEX('%[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', DOCNUMBER + '.') AS DOCNUMBER_START_OF_ORDER_NUMBER
) as ca_DOCNUMBER_START_OF_ORDER_NUMBER
cross apply (
select case when DOCNUMBER_START_OF_ORDER_NUMBER >= 1 then STUFF(docnumber, DOCNUMBER_START_OF_ORDER_NUMBER + 3, 1, '9') end AS DOCNUMBER1
) as ca_DOCNUMBER1
outer apply (
select top (1) tally AS tally1, STUFF(DOCFREETEXT, tally + 3, 1, '9') AS DOCFREETEXT1
from dbo.tally
where PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT1
outer apply (
select top (1) tally AS tally2, case when tally1 > 0 then STUFF(DOCFREETEXT1, tally + 3, 1, '9') end AS DOCFREETEXT2
from dbo.tally
where
tally >= tally1 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT1, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT2
outer apply (
select top (1) tally AS tally3, case when tally2 > 0 then STUFF(DOCFREETEXT2, tally + 3, 1, '9') end AS DOCFREETEXT3
from dbo.tally
where
tally >= tally2 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT2, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT3
outer apply (
select top (1) tally AS tally4, case when tally3 > 0 then STUFF(DOCFREETEXT3, tally + 3, 1, '9') end AS DOCFREETEXT4
from dbo.tally
where
tally >= tally3 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT3, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT4
outer apply (
select top (1) tally AS tally5, case when tally4 > 0 then STUFF(DOCFREETEXT4, tally + 3, 1, '9') end AS DOCFREETEXT5
from dbo.tally
where
tally >= tally4 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT4, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT5
outer apply (
select top (1) tally AS tally6, case when tally5 > 0 then STUFF(DOCFREETEXT5, tally + 3, 1, '9') end AS DOCFREETEXT6
from dbo.tally
where
tally >= tally5 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT5, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT6
Hi Scott,
How many rows should be in the tally table? With a 100 row tally table it took nearly 4 minutes to process 500,000 records.
Thanks

As many rows as there are bytes in the string. But Yikes! If it takes that long, "never mind" :hehe:

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".