August 12, 2015 at 6:17 am
All,
I’ve created a script which will do the following.
Update fields in a database which contain instances of an order number. An order number is defined as a 10 digit numeric sting which beings with 998. The first 3 digits of the order number need to change from 998 to 999. There are two types of fields to update: document number fields which may contain 1 instance of the order number and free text fields which may contain multiple instances of an order number.
I created a function which accepts the text to be updated, the text to find and the text to replace it with. The function then loops through the sting for instances of 998. For each instance it finds it checks to ensure that it is at the start of a 10 digit string which contains only numeric numbers – if this is the case then it will update the 998 to be 999.
If the field is free text then it will continue to loop through the string (it will skip forward 10 digits for every order number found) until the end. If the field is not free text then it will exit the script after the first instance found which matches the above criteria, if any.
I was wondering if someone could point in me in the right direction of achieving this via set processing and not having to loop through every line. I’ve included the function below and some test data.
Thanks in advance.
--Create the function we will call in order to do the replace
if object_id(N'OrderReplace',N'FN') is not null
drop function dbo.OrderReplace;
go
create function dbo.OrderReplace (
@Textnvarchar(4000),
@FindTextnvarchar(6),
@ReplaceTextnvarchar(6),
@IsFreeTextbit
)
returns nvarchar(4000)
as
begin
declare@FindLocation as int,
@TotalLen as int,
@FindLen as int,
@StartPosition as int = 1;
select @TotalLen = len(@Text);
select @FindLen = len(@FindText);
while (@StartPosition < @TotalLen) --while the starting position for the find is less than the total length
begin
select @FindLocation = charindex(@FindText, @Text, @StartPosition); --get the location of the first instance of FindText from the start position
if @FindLocation = 0 -- if nothing found then exit loop
break;
--ensure @FindText is the start of a 10 digit numberic string. This way we can be confident that its a maint order no.
if len(substring(@Text,@FindLocation, 10)) = 10 and isnumeric(substring(@Text,@FindLocation, 10)+'.e0') = 1 --add .e0 to stop isnumeric returning true if a comma or period exists in string already
begin
select @Text = stuff(@Text, @FindLocation,len(substring(@Text, @FindLocation, @TotalLen)), @ReplaceText + substring(@Text, @FindLocation+@FindLen, @TotalLen)); --remove 3 charaters from the string at the findlocation and replace with 181
if @IsFreeText = 1 --if it is a free text then need to loop through the string, otherwise break out of loop
select @StartPosition = @FindLocation + 9; -- set the new start position by using the find location and adding the number of charaters to skip.
else
break;
end
else
select @StartPosition = @FindLocation + @FindLen; -- set the new start position by using the find location and adding the number of charaters to skip.
end
return @Text;
end
go
with x as (
select '2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT
union
select '2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT
union
select '2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT
union
select '2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT
union
select '2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT
)
selectdbo.OrderReplace(DOCNUMBER, '998', '999', 0)
,dbo.OrderReplace(DOCFREETEXT, '998', '999',1)
fromx
Expected Output
DOCNUMBERDOCFREETEXT
2015/3352424Testing 9997998646
2015/33563/RA/9997998647 Testing 9997998647
2015/3356324Testing 9997895645 998
2015/33564/RA/9994564658 Testing 9994564658,9994564659,9994564660 998
2015/3999324Testing 9997895646
August 12, 2015 at 6:25 am
In your expected output i see there is a comma delimiter between strings, is this the way the data is stored or added by you when posting the query?
if there is a comma delimited value between strings then a simple way to acheive the results is
-Split the entire set into two data sets
- Dataset1 contains rows with only a single value for order num ( then simply use the left function to replace as needed)
- Dataset2 contains rows with a comma delimited string for order numbers , here you can use the replace function (inclusive of the comma and 998 to be repalced with comma and 999)
August 12, 2015 at 6:33 am
Jayanth_Kurup (8/12/2015)
In your expected output i see there is a comma delimiter between strings, is this the way the data is stored or added by you when posting the query?if there is a comma delimited value between strings then a simple way to acheive the results is
-Split the entire set into two data sets
- Dataset1 contains rows with only a single value for order num ( then simply use the left function to replace as needed)
- Dataset2 contains rows with a comma delimited string for order numbers , here you can use the replace function (inclusive of the comma and 998 to be repalced with comma and 999)
Thanks for the reply. No the data isn't comma delimited. This was an example of data in a free text field where a user can enter anything they want, they might enter a single order number and some text, they might enter text and multiple order numbers separated by a space,comma,dash etc. so there is no logic to how the order numbers will be in this field.
August 12, 2015 at 7:01 am
Here's a possible option. Be sure to test what it does and understand it.
It basically splits the strings to find the different digit portions by using the PatternSplitCM which can be found here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/
It then joins the strings again using this technique: http://www.sqlservercentral.com/articles/comma+separated+list/71700/
create function dbo.itvf_OrderReplace (
@Textnvarchar(4000),
@FindTextnvarchar(6),
@ReplaceTextnvarchar(6),
@IsFreeTextbit
)
returns table
as
RETURN
SELECT (SELECT CASE WHEN Matched = 1 --Only digits
AND Item LIKE @FindText + '%' --Starts by FindText
AND LEN(Item) = 10 --10 digits
AND (ItemNumber = MIN(ItemNumber) OVER(PARTITION BY Matched) OR @IsFreeText = 1) --Change either the first or all
THEN STUFF( Item, 1, LEN(@FindText), @ReplaceText)
ELSE Item END
FROM dbo.PatternSplitCM( @Text, '%[0-9]%')
ORDER BY ItemNumber
FOR XML PATH(''), TYPE).value('.', 'nvarchar(4000)') AS newOrder
go
with x as (
select '2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT
union
select '2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT
union
select '2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT
union
select '2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT
union
select '2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT
)
selectDOCNUMBER
,dbo.OrderReplace(DOCNUMBER, '998', '999', 0)
,DN.newOrder
,DOCFREETEXT
,dbo.OrderReplace(DOCFREETEXT, '998', '999',1)
,DFT.newOrder
fromx
CROSS APPLY dbo.itvf_OrderReplace(DOCNUMBER, '998', '999', 0) DN
CROSS APPLY dbo.itvf_OrderReplace(DOCFREETEXT, '998', '999', 1) DFT
August 12, 2015 at 7:51 am
Thank you Luis. I will have a play about with this so I understand fully what it is doing but after a first test, on a million rows, using only the free text column (on some pretty decrepit hardware) your script took 17 seconds whereas my while loop took 1 min 7 seconds.
August 12, 2015 at 7:58 am
You're welcome, feel free to ask any questions that you have.
I also created an initial version that lacked versatility and replaced only the first order number. It was like this:
ISNULL( STUFF( DOCNUMBER, PATINDEX('%[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', '-' + DOCNUMBER + '-'), 3, '999'), DOCNUMBER)
August 12, 2015 at 12:04 pm
Here's a method without an external function, just out of curiosity mostly about how it would perform vs. other methods. It uses a "standard" tally table. This method does require a fixed number of possible substitutions: for now I used 6, but more could be added.
with x as (
select 1 as rec#,'2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT
union
select 2,'2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT
union
select 3,'2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT
union
select 4,'2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT
union
select 5,'2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT
)
select
isnull(docnumber1, docnumber) as docnumber,
coalesce(docfreetext6, docfreetext5, docfreetext4, docfreetext3, docfreetext2, docfreetext1) AS docfreetext
--,*
fromx
cross apply (
select PATINDEX('%[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', DOCNUMBER + '.') AS DOCNUMBER_START_OF_ORDER_NUMBER
) as ca_DOCNUMBER_START_OF_ORDER_NUMBER
cross apply (
select case when DOCNUMBER_START_OF_ORDER_NUMBER >= 1 then STUFF(docnumber, DOCNUMBER_START_OF_ORDER_NUMBER + 3, 1, '9') end AS DOCNUMBER1
) as ca_DOCNUMBER1
outer apply (
select top (1) tally AS tally1, STUFF(DOCFREETEXT, tally + 3, 1, '9') AS DOCFREETEXT1
from dbo.tally
where PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT1
outer apply (
select top (1) tally AS tally2, case when tally1 > 0 then STUFF(DOCFREETEXT1, tally + 3, 1, '9') end AS DOCFREETEXT2
from dbo.tally
where
tally >= tally1 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT1, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT2
outer apply (
select top (1) tally AS tally3, case when tally2 > 0 then STUFF(DOCFREETEXT2, tally + 3, 1, '9') end AS DOCFREETEXT3
from dbo.tally
where
tally >= tally2 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT2, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT3
outer apply (
select top (1) tally AS tally4, case when tally3 > 0 then STUFF(DOCFREETEXT3, tally + 3, 1, '9') end AS DOCFREETEXT4
from dbo.tally
where
tally >= tally3 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT3, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT4
outer apply (
select top (1) tally AS tally5, case when tally4 > 0 then STUFF(DOCFREETEXT4, tally + 3, 1, '9') end AS DOCFREETEXT5
from dbo.tally
where
tally >= tally4 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT4, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT5
outer apply (
select top (1) tally AS tally6, case when tally5 > 0 then STUFF(DOCFREETEXT5, tally + 3, 1, '9') end AS DOCFREETEXT6
from dbo.tally
where
tally >= tally5 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT5, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT6
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
August 14, 2015 at 4:18 am
ScottPletcher (8/12/2015)
Here's a method without an external function, just out of curiosity mostly about how it would perform vs. other methods. It uses a "standard" tally table. This method does require a fixed number of possible substitutions: for now I used 6, but more could be added.
with x as (
select 1 as rec#,'2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT
union
select 2,'2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT
union
select 3,'2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT
union
select 4,'2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT
union
select 5,'2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT
)
select
isnull(docnumber1, docnumber) as docnumber,
coalesce(docfreetext6, docfreetext5, docfreetext4, docfreetext3, docfreetext2, docfreetext1) AS docfreetext
--,*
fromx
cross apply (
select PATINDEX('%[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', DOCNUMBER + '.') AS DOCNUMBER_START_OF_ORDER_NUMBER
) as ca_DOCNUMBER_START_OF_ORDER_NUMBER
cross apply (
select case when DOCNUMBER_START_OF_ORDER_NUMBER >= 1 then STUFF(docnumber, DOCNUMBER_START_OF_ORDER_NUMBER + 3, 1, '9') end AS DOCNUMBER1
) as ca_DOCNUMBER1
outer apply (
select top (1) tally AS tally1, STUFF(DOCFREETEXT, tally + 3, 1, '9') AS DOCFREETEXT1
from dbo.tally
where PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT1
outer apply (
select top (1) tally AS tally2, case when tally1 > 0 then STUFF(DOCFREETEXT1, tally + 3, 1, '9') end AS DOCFREETEXT2
from dbo.tally
where
tally >= tally1 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT1, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT2
outer apply (
select top (1) tally AS tally3, case when tally2 > 0 then STUFF(DOCFREETEXT2, tally + 3, 1, '9') end AS DOCFREETEXT3
from dbo.tally
where
tally >= tally2 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT2, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT3
outer apply (
select top (1) tally AS tally4, case when tally3 > 0 then STUFF(DOCFREETEXT3, tally + 3, 1, '9') end AS DOCFREETEXT4
from dbo.tally
where
tally >= tally3 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT3, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT4
outer apply (
select top (1) tally AS tally5, case when tally4 > 0 then STUFF(DOCFREETEXT4, tally + 3, 1, '9') end AS DOCFREETEXT5
from dbo.tally
where
tally >= tally4 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT4, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT5
outer apply (
select top (1) tally AS tally6, case when tally5 > 0 then STUFF(DOCFREETEXT5, tally + 3, 1, '9') end AS DOCFREETEXT6
from dbo.tally
where
tally >= tally5 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT5, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT6
Hi Scott,
How many rows should be in the tally table? With a 100 row tally table it took nearly 4 minutes to process 500,000 records.
Thanks
August 14, 2015 at 7:31 am
FridayNightGiant (8/14/2015)
ScottPletcher (8/12/2015)
Here's a method without an external function, just out of curiosity mostly about how it would perform vs. other methods. It uses a "standard" tally table. This method does require a fixed number of possible substitutions: for now I used 6, but more could be added.
with x as (
select 1 as rec#,'2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT
union
select 2,'2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT
union
select 3,'2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT
union
select 4,'2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT
union
select 5,'2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT
)
select
isnull(docnumber1, docnumber) as docnumber,
coalesce(docfreetext6, docfreetext5, docfreetext4, docfreetext3, docfreetext2, docfreetext1) AS docfreetext
--,*
fromx
cross apply (
select PATINDEX('%[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', DOCNUMBER + '.') AS DOCNUMBER_START_OF_ORDER_NUMBER
) as ca_DOCNUMBER_START_OF_ORDER_NUMBER
cross apply (
select case when DOCNUMBER_START_OF_ORDER_NUMBER >= 1 then STUFF(docnumber, DOCNUMBER_START_OF_ORDER_NUMBER + 3, 1, '9') end AS DOCNUMBER1
) as ca_DOCNUMBER1
outer apply (
select top (1) tally AS tally1, STUFF(DOCFREETEXT, tally + 3, 1, '9') AS DOCFREETEXT1
from dbo.tally
where PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT1
outer apply (
select top (1) tally AS tally2, case when tally1 > 0 then STUFF(DOCFREETEXT1, tally + 3, 1, '9') end AS DOCFREETEXT2
from dbo.tally
where
tally >= tally1 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT1, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT2
outer apply (
select top (1) tally AS tally3, case when tally2 > 0 then STUFF(DOCFREETEXT2, tally + 3, 1, '9') end AS DOCFREETEXT3
from dbo.tally
where
tally >= tally2 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT2, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT3
outer apply (
select top (1) tally AS tally4, case when tally3 > 0 then STUFF(DOCFREETEXT3, tally + 3, 1, '9') end AS DOCFREETEXT4
from dbo.tally
where
tally >= tally3 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT3, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT4
outer apply (
select top (1) tally AS tally5, case when tally4 > 0 then STUFF(DOCFREETEXT4, tally + 3, 1, '9') end AS DOCFREETEXT5
from dbo.tally
where
tally >= tally4 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT4, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT5
outer apply (
select top (1) tally AS tally6, case when tally5 > 0 then STUFF(DOCFREETEXT5, tally + 3, 1, '9') end AS DOCFREETEXT6
from dbo.tally
where
tally >= tally5 + 10 and
PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT5, tally, 8000) + '.') > 0
) as ca_DOCFREETEXT6
Hi Scott,
How many rows should be in the tally table? With a 100 row tally table it took nearly 4 minutes to process 500,000 records.
Thanks
As many rows as there are bytes in the string. But Yikes! If it takes that long, "never mind" :hehe:
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply