While loop to replace text - how to re-write to use set processing

  • All,

    I’ve created a script which will do the following.

    Update fields in a database which contain instances of an order number. An order number is defined as a 10 digit numeric sting which beings with 998. The first 3 digits of the order number need to change from 998 to 999. There are two types of fields to update: document number fields which may contain 1 instance of the order number and free text fields which may contain multiple instances of an order number.

    I created a function which accepts the text to be updated, the text to find and the text to replace it with. The function then loops through the sting for instances of 998. For each instance it finds it checks to ensure that it is at the start of a 10 digit string which contains only numeric numbers – if this is the case then it will update the 998 to be 999.

    If the field is free text then it will continue to loop through the string (it will skip forward 10 digits for every order number found) until the end. If the field is not free text then it will exit the script after the first instance found which matches the above criteria, if any.

    I was wondering if someone could point in me in the right direction of achieving this via set processing and not having to loop through every line. I’ve included the function below and some test data.

    Thanks in advance.

    --Create the function we will call in order to do the replace

    if object_id(N'OrderReplace',N'FN') is not null

    drop function dbo.OrderReplace;

    go

    create function dbo.OrderReplace (

    @Textnvarchar(4000),

    @FindTextnvarchar(6),

    @ReplaceTextnvarchar(6),

    @IsFreeTextbit

    )

    returns nvarchar(4000)

    as

    begin

    declare@FindLocation as int,

    @TotalLen as int,

    @FindLen as int,

    @StartPosition as int = 1;

    select @TotalLen = len(@Text);

    select @FindLen = len(@FindText);

    while (@StartPosition < @TotalLen) --while the starting position for the find is less than the total length

    begin

    select @FindLocation = charindex(@FindText, @Text, @StartPosition); --get the location of the first instance of FindText from the start position

    if @FindLocation = 0 -- if nothing found then exit loop

    break;

    --ensure @FindText is the start of a 10 digit numberic string. This way we can be confident that its a maint order no.

    if len(substring(@Text,@FindLocation, 10)) = 10 and isnumeric(substring(@Text,@FindLocation, 10)+'.e0') = 1 --add .e0 to stop isnumeric returning true if a comma or period exists in string already

    begin

    select @Text = stuff(@Text, @FindLocation,len(substring(@Text, @FindLocation, @TotalLen)), @ReplaceText + substring(@Text, @FindLocation+@FindLen, @TotalLen)); --remove 3 charaters from the string at the findlocation and replace with 181

    if @IsFreeText = 1 --if it is a free text then need to loop through the string, otherwise break out of loop

    select @StartPosition = @FindLocation + 9; -- set the new start position by using the find location and adding the number of charaters to skip.

    else

    break;

    end

    else

    select @StartPosition = @FindLocation + @FindLen; -- set the new start position by using the find location and adding the number of charaters to skip.

    end

    return @Text;

    end

    go

    with x as (

    select '2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT

    union

    select '2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT

    union

    select '2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT

    union

    select '2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT

    union

    select '2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT

    )

    selectdbo.OrderReplace(DOCNUMBER, '998', '999', 0)

    ,dbo.OrderReplace(DOCFREETEXT, '998', '999',1)

    fromx

    Expected Output

    DOCNUMBERDOCFREETEXT

    2015/3352424Testing 9997998646

    2015/33563/RA/9997998647 Testing 9997998647

    2015/3356324Testing 9997895645 998

    2015/33564/RA/9994564658 Testing 9994564658,9994564659,9994564660 998

    2015/3999324Testing 9997895646

  • In your expected output i see there is a comma delimiter between strings, is this the way the data is stored or added by you when posting the query?

    if there is a comma delimited value between strings then a simple way to acheive the results is

    -Split the entire set into two data sets

    - Dataset1 contains rows with only a single value for order num ( then simply use the left function to replace as needed)

    - Dataset2 contains rows with a comma delimited string for order numbers , here you can use the replace function (inclusive of the comma and 998 to be repalced with comma and 999)

    Jayanth Kurup[/url]

  • Jayanth_Kurup (8/12/2015)


    In your expected output i see there is a comma delimiter between strings, is this the way the data is stored or added by you when posting the query?

    if there is a comma delimited value between strings then a simple way to acheive the results is

    -Split the entire set into two data sets

    - Dataset1 contains rows with only a single value for order num ( then simply use the left function to replace as needed)

    - Dataset2 contains rows with a comma delimited string for order numbers , here you can use the replace function (inclusive of the comma and 998 to be repalced with comma and 999)

    Thanks for the reply. No the data isn't comma delimited. This was an example of data in a free text field where a user can enter anything they want, they might enter a single order number and some text, they might enter text and multiple order numbers separated by a space,comma,dash etc. so there is no logic to how the order numbers will be in this field.

  • Here's a possible option. Be sure to test what it does and understand it.

    It basically splits the strings to find the different digit portions by using the PatternSplitCM which can be found here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/

    It then joins the strings again using this technique: http://www.sqlservercentral.com/articles/comma+separated+list/71700/

    create function dbo.itvf_OrderReplace (

    @Textnvarchar(4000),

    @FindTextnvarchar(6),

    @ReplaceTextnvarchar(6),

    @IsFreeTextbit

    )

    returns table

    as

    RETURN

    SELECT (SELECT CASE WHEN Matched = 1 --Only digits

    AND Item LIKE @FindText + '%' --Starts by FindText

    AND LEN(Item) = 10 --10 digits

    AND (ItemNumber = MIN(ItemNumber) OVER(PARTITION BY Matched) OR @IsFreeText = 1) --Change either the first or all

    THEN STUFF( Item, 1, LEN(@FindText), @ReplaceText)

    ELSE Item END

    FROM dbo.PatternSplitCM( @Text, '%[0-9]%')

    ORDER BY ItemNumber

    FOR XML PATH(''), TYPE).value('.', 'nvarchar(4000)') AS newOrder

    go

    with x as (

    select '2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT

    union

    select '2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT

    union

    select '2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT

    union

    select '2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT

    union

    select '2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT

    )

    selectDOCNUMBER

    ,dbo.OrderReplace(DOCNUMBER, '998', '999', 0)

    ,DN.newOrder

    ,DOCFREETEXT

    ,dbo.OrderReplace(DOCFREETEXT, '998', '999',1)

    ,DFT.newOrder

    fromx

    CROSS APPLY dbo.itvf_OrderReplace(DOCNUMBER, '998', '999', 0) DN

    CROSS APPLY dbo.itvf_OrderReplace(DOCFREETEXT, '998', '999', 1) DFT

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thank you Luis. I will have a play about with this so I understand fully what it is doing but after a first test, on a million rows, using only the free text column (on some pretty decrepit hardware) your script took 17 seconds whereas my while loop took 1 min 7 seconds.

  • You're welcome, feel free to ask any questions that you have.

    I also created an initial version that lacked versatility and replaced only the first order number. It was like this:

    ISNULL( STUFF( DOCNUMBER, PATINDEX('%[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', '-' + DOCNUMBER + '-'), 3, '999'), DOCNUMBER)

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Here's a method without an external function, just out of curiosity mostly about how it would perform vs. other methods. It uses a "standard" tally table. This method does require a fixed number of possible substitutions: for now I used 6, but more could be added.

    with x as (

    select 1 as rec#,'2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT

    union

    select 2,'2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT

    union

    select 3,'2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT

    union

    select 4,'2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT

    union

    select 5,'2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT

    )

    select

    isnull(docnumber1, docnumber) as docnumber,

    coalesce(docfreetext6, docfreetext5, docfreetext4, docfreetext3, docfreetext2, docfreetext1) AS docfreetext

    --,*

    fromx

    cross apply (

    select PATINDEX('%[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', DOCNUMBER + '.') AS DOCNUMBER_START_OF_ORDER_NUMBER

    ) as ca_DOCNUMBER_START_OF_ORDER_NUMBER

    cross apply (

    select case when DOCNUMBER_START_OF_ORDER_NUMBER >= 1 then STUFF(docnumber, DOCNUMBER_START_OF_ORDER_NUMBER + 3, 1, '9') end AS DOCNUMBER1

    ) as ca_DOCNUMBER1

    outer apply (

    select top (1) tally AS tally1, STUFF(DOCFREETEXT, tally + 3, 1, '9') AS DOCFREETEXT1

    from dbo.tally

    where PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT1

    outer apply (

    select top (1) tally AS tally2, case when tally1 > 0 then STUFF(DOCFREETEXT1, tally + 3, 1, '9') end AS DOCFREETEXT2

    from dbo.tally

    where

    tally >= tally1 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT1, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT2

    outer apply (

    select top (1) tally AS tally3, case when tally2 > 0 then STUFF(DOCFREETEXT2, tally + 3, 1, '9') end AS DOCFREETEXT3

    from dbo.tally

    where

    tally >= tally2 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT2, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT3

    outer apply (

    select top (1) tally AS tally4, case when tally3 > 0 then STUFF(DOCFREETEXT3, tally + 3, 1, '9') end AS DOCFREETEXT4

    from dbo.tally

    where

    tally >= tally3 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT3, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT4

    outer apply (

    select top (1) tally AS tally5, case when tally4 > 0 then STUFF(DOCFREETEXT4, tally + 3, 1, '9') end AS DOCFREETEXT5

    from dbo.tally

    where

    tally >= tally4 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT4, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT5

    outer apply (

    select top (1) tally AS tally6, case when tally5 > 0 then STUFF(DOCFREETEXT5, tally + 3, 1, '9') end AS DOCFREETEXT6

    from dbo.tally

    where

    tally >= tally5 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT5, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT6

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (8/12/2015)


    Here's a method without an external function, just out of curiosity mostly about how it would perform vs. other methods. It uses a "standard" tally table. This method does require a fixed number of possible substitutions: for now I used 6, but more could be added.

    with x as (

    select 1 as rec#,'2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT

    union

    select 2,'2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT

    union

    select 3,'2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT

    union

    select 4,'2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT

    union

    select 5,'2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT

    )

    select

    isnull(docnumber1, docnumber) as docnumber,

    coalesce(docfreetext6, docfreetext5, docfreetext4, docfreetext3, docfreetext2, docfreetext1) AS docfreetext

    --,*

    fromx

    cross apply (

    select PATINDEX('%[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', DOCNUMBER + '.') AS DOCNUMBER_START_OF_ORDER_NUMBER

    ) as ca_DOCNUMBER_START_OF_ORDER_NUMBER

    cross apply (

    select case when DOCNUMBER_START_OF_ORDER_NUMBER >= 1 then STUFF(docnumber, DOCNUMBER_START_OF_ORDER_NUMBER + 3, 1, '9') end AS DOCNUMBER1

    ) as ca_DOCNUMBER1

    outer apply (

    select top (1) tally AS tally1, STUFF(DOCFREETEXT, tally + 3, 1, '9') AS DOCFREETEXT1

    from dbo.tally

    where PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT1

    outer apply (

    select top (1) tally AS tally2, case when tally1 > 0 then STUFF(DOCFREETEXT1, tally + 3, 1, '9') end AS DOCFREETEXT2

    from dbo.tally

    where

    tally >= tally1 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT1, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT2

    outer apply (

    select top (1) tally AS tally3, case when tally2 > 0 then STUFF(DOCFREETEXT2, tally + 3, 1, '9') end AS DOCFREETEXT3

    from dbo.tally

    where

    tally >= tally2 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT2, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT3

    outer apply (

    select top (1) tally AS tally4, case when tally3 > 0 then STUFF(DOCFREETEXT3, tally + 3, 1, '9') end AS DOCFREETEXT4

    from dbo.tally

    where

    tally >= tally3 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT3, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT4

    outer apply (

    select top (1) tally AS tally5, case when tally4 > 0 then STUFF(DOCFREETEXT4, tally + 3, 1, '9') end AS DOCFREETEXT5

    from dbo.tally

    where

    tally >= tally4 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT4, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT5

    outer apply (

    select top (1) tally AS tally6, case when tally5 > 0 then STUFF(DOCFREETEXT5, tally + 3, 1, '9') end AS DOCFREETEXT6

    from dbo.tally

    where

    tally >= tally5 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT5, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT6

    Hi Scott,

    How many rows should be in the tally table? With a 100 row tally table it took nearly 4 minutes to process 500,000 records.

    Thanks

  • FridayNightGiant (8/14/2015)


    ScottPletcher (8/12/2015)


    Here's a method without an external function, just out of curiosity mostly about how it would perform vs. other methods. It uses a "standard" tally table. This method does require a fixed number of possible substitutions: for now I used 6, but more could be added.

    with x as (

    select 1 as rec#,'2015/3356324' as DOCNUMBER, 'Testing 9987895645 998' as DOCFREETEXT

    union

    select 2,'2015/3352424' as DOCNUMBER, 'Testing 9987998646' as DOCFREETEXT

    union

    select 3,'2015/33563/RA/9987998647' as DOCNUMBER, 'Testing 9987998647' as DOCFREETEXT

    union

    select 4,'2015/33564/RA/9984564658' as DOCNUMBER, 'Testing 9984564658,9984564659,9984564660 998' as DOCFREETEXT

    union

    select 5,'2015/3998324' as DOCNUMBER, 'Testing 9987895646' as DOCFREETEXT

    )

    select

    isnull(docnumber1, docnumber) as docnumber,

    coalesce(docfreetext6, docfreetext5, docfreetext4, docfreetext3, docfreetext2, docfreetext1) AS docfreetext

    --,*

    fromx

    cross apply (

    select PATINDEX('%[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', DOCNUMBER + '.') AS DOCNUMBER_START_OF_ORDER_NUMBER

    ) as ca_DOCNUMBER_START_OF_ORDER_NUMBER

    cross apply (

    select case when DOCNUMBER_START_OF_ORDER_NUMBER >= 1 then STUFF(docnumber, DOCNUMBER_START_OF_ORDER_NUMBER + 3, 1, '9') end AS DOCNUMBER1

    ) as ca_DOCNUMBER1

    outer apply (

    select top (1) tally AS tally1, STUFF(DOCFREETEXT, tally + 3, 1, '9') AS DOCFREETEXT1

    from dbo.tally

    where PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT1

    outer apply (

    select top (1) tally AS tally2, case when tally1 > 0 then STUFF(DOCFREETEXT1, tally + 3, 1, '9') end AS DOCFREETEXT2

    from dbo.tally

    where

    tally >= tally1 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT1, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT2

    outer apply (

    select top (1) tally AS tally3, case when tally2 > 0 then STUFF(DOCFREETEXT2, tally + 3, 1, '9') end AS DOCFREETEXT3

    from dbo.tally

    where

    tally >= tally2 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT2, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT3

    outer apply (

    select top (1) tally AS tally4, case when tally3 > 0 then STUFF(DOCFREETEXT3, tally + 3, 1, '9') end AS DOCFREETEXT4

    from dbo.tally

    where

    tally >= tally3 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT3, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT4

    outer apply (

    select top (1) tally AS tally5, case when tally4 > 0 then STUFF(DOCFREETEXT4, tally + 3, 1, '9') end AS DOCFREETEXT5

    from dbo.tally

    where

    tally >= tally4 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT4, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT5

    outer apply (

    select top (1) tally AS tally6, case when tally5 > 0 then STUFF(DOCFREETEXT5, tally + 3, 1, '9') end AS DOCFREETEXT6

    from dbo.tally

    where

    tally >= tally5 + 10 and

    PATINDEX('[^0-9]998[0-9][0-9][0-9][0-9][0-9][0-9][0-9][^0-9]%', SUBSTRING(DOCFREETEXT5, tally, 8000) + '.') > 0

    ) as ca_DOCFREETEXT6

    Hi Scott,

    How many rows should be in the tally table? With a 100 row tally table it took nearly 4 minutes to process 500,000 records.

    Thanks

    As many rows as there are bytes in the string. But Yikes! If it takes that long, "never mind" :hehe:

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply