how to remove numbers from strings?

  • Max Yasnytskyy (10/18/2007)[hrAre you trying to say that to solve this sort of problem any additional table is required?

    Heck no... there are dozens of slower ways to do it 😀 But, like Serqiy said, a nice little static utility table like this sure does make solving many "impossible" or "gonna need a cursor" problems simply melt away.

    Build a permanent Tally table... you'll love it once you learn to use it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Max Yasnytskyy (10/17/2007)


    I have found an interesintg function in msdb, seems like it is working much better

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE FUNCTION [dbo].[udfTRANSELATE] (@mixedString AS varchar(max))

    RETURNS varchar(max)

    as

    BEGIN

    DECLARE @resultString varchar(max), @tmpString char(1), @i int, @strLen int

    SELECT @resultString='', @i=0, @strLen=len(@mixedString)

    WHILE @strLen>=@i

    BEGIN

    SELECT @tmpString=substring (@mixedString,@i/*character index*/,1/*one character*/)

    if msdb.dbo.ConvertToInt (@tmpString,11,11)=11

    SELECT @resultString=@resultString+@tmpString

    SELECT @i=@i+1

    END

    RETURN @resultString

    END

    this is what i needed. thanx.

    the first solution didnt work. because of the Tally i didnt implement.

    i'm using 2005. sorry for the cunfusion, didnt notice wrong forum.

    ps.

    meanwhile the client "revised" some specs, so i do need these numbers now. but i still need to pull them from the original one.

    so it will be somethimg like that:

    if msdb.dbo.ConvertToInt (@tmpString,11,11)=11

    SELECT @resultString=@resultString+@tmpString

    else

    SELECT @numberString=@numberString+@tmpString

    SELECT @i=@i+1

    i dont need it as a function because i need it only in one place in the whole code.

  • the first solution didnt work. because of the Tally i didnt implement...

    ps.

    meanwhile the client "revised" some specs, so i do need these numbers now. but i still need to pull them from the original one.

    so it will be somethimg like that:

    if msdb.dbo.ConvertToInt (@tmpString,11,11)=11

    SELECT @resultString=@resultString+@tmpString

    else

    SELECT @numberString=@numberString+@tmpString

    SELECT @i=@i+1

    i dont need it as a function because i need it only in one place in the whole code.

    You'll still need it as a function... and, if you had invoked the Tally table, it would be easy. 😉 The Tally table makes a great deal of such tasks as simple as falling off your chair when you realize what you can do with it 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • WEll - since we're now throwing in 2005 - this would be the perfect time to create your dbo.RegexReplace CLR function, as so:

    Imports System

    Imports System.Data

    Imports System.Data.SqlClient

    Imports System.Data.SqlTypes

    Imports Microsoft.SqlServer.Server

    Imports System.Text.RegularExpressions

    Partial Public Class UserDefinedFunctions

    <SqlFunction> _

    Public Shared Function RegexReplace(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal rep As SqlString) As SqlString

    ' Add your code here

    Dim rex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern.Value, optionS)

    Return New SqlString(rex.Replace(New String(input.Value), rep.Value))

    End Function

    End Class

    And then the whole "cleaning process" becomes:

    declare @dirtystring varchar(500)

    declare @cleanstring varchar(500)

    declare @rex varchar(100)

    select @dirtystring='123k4biodf-k123l4h123098d-09dfknqwerklhsdfo0er98tqwer/;/;leriuqwer-t0'

    select @rex='[^0-9]'

    select @cleanstring=dbo.regexreplace(@dirtystring,@rex,'')

    select @cleanstring

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Jeff Moden (10/18/2007)


    the first solution didnt work. because of the Tally i didnt implement...

    ps.

    meanwhile the client "revised" some specs, so i do need these numbers now. but i still need to pull them from the original one.

    so it will be somethimg like that:

    if msdb.dbo.ConvertToInt (@tmpString,11,11)=11

    SELECT @resultString=@resultString+@tmpString

    else

    SELECT @numberString=@numberString+@tmpString

    SELECT @i=@i+1

    i dont need it as a function because i need it only in one place in the whole code.

    You'll still need it as a function... and, if you had invoked the Tally table, it would be easy. 😉 The Tally table makes a great deal of such tasks as simple as falling off your chair when you realize what you can do with it 😀

    for now i'm happy if it just works. and it doesnt slow the whole stored procedure considerably.

    maybe later i'll learn about those tally tables. 🙂

  • Now, that looks like a good use for a CLR... how's the performance? (I can't test it... don't have 2k5 available)...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I don't have a HUGE database on hand with relevant data, but on 50K records...maybe a second.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (10/18/2007)


    I don't have a HUGE database on hand with relevant data, but on 50K records...maybe a second.

    Thanks, Matt. Like I've said so many times, I don't have 2k5 to test on (gotta break down and install the Dev version)... Would you be willing to try it on my "performance test table"? This script builds a million row test table... run the CLR on "SomeCSV" column, please... I didn't include any non-breaking spaces in the code this time... 😉

    --===== Create and populate a 1,000,000 row test table.

    -- Column RowNum has a range of 1 to 1,000,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeString" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeNumber has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Takes about 80 seconds to execute.

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1),

    SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),

    SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

    + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),

    SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum)

    Sure do appreciate it... these types of tests help me to know which direction to steer especially since your CLR is one of the few uses I've seen that even come close to justifying the use of a CLR... 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ACtually - I'm way ahead of ya. That last response felt a bit like a cop-out, so I went and built my own "big table. For lack of a better way to do it - I decided to go out and "clean" 1M NEWID()'s of all of their numeric characters.

    For anyone who looked at my earlier post - one of the CLR instructions got "whacked" from the first paste. Grab it again. Testing on a XP machine running Dev.Ed. (1GB Total RAM on the machine), so there might be some amount of RAM-bound problems with the result, but...I think they speak for themselves anyway.RAM doesn't seem to be a HUGE issue, no spikes for it at any phase.

    Testing script

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[tbltest](

    [gid] [uniqueidentifier] NOT NULL CONSTRAINT [DF_tbltest_gid] DEFAULT (newid()),

    [num] [bigint] NOT NULL

    ) ON [PRIMARY]

    CREATE UNIQUE CLUSTERED INDEX tbltest_Clust on tbltest(gid)

    --fill with data

    insert tbltest (num)

    select t2.num from master.dbo.spt_values cross join spt_values t2 --5.5M records, now with a GUID

    --now - try the test

    use test

    go

    declare @t varchar(100)

    declare @g datetime

    --just get a count of our "victims"

    select @g=getdate()

    select count(*) from t2 where cast(gid as varchar(100)) between 'a' and 'd'

    select 'regular count',datediff(ms,@g,getdate())

    --Regex method - don't DISPLAY the update - just cycle through them

    select @g=getdate()

    select @t=dbo.regexreplace(gid,'[0-9]','') from t2

    where cast(gid as varchar(100)) between 'a' and 'd'

    select 'regex replace',datediff(ms,@g,getdate())

    --Derek's method - don't DISPLAY the update - just cycle through them

    select @g=getdate()

    select @t=dbo.udfTRANSELATE(gid) from t2

    where cast(gid as varchar(100)) between 'a' and 'd'

    select 'dereks solution', datediff(ms,@g,getdate())

    results:

    Count: 1,033,762

    Regular count 6,483 (6s - as you'd expect)

    Regex replace 91,486 (1 min. 31.486 s)

    dereks solution 2,512,687 (41 min.52.687 s)

    Improvement 27 times faster

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt, just for fun - compare the memory usage when you run CLR udf and udfTRANSELATE

  • sqlServer had 256MB in use at all times. Didn't ask for more (100MB stayed "free" the whole time).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • As a general remark - i had a problem with CLR function which was executed about 200000 times a day and was building XML strings up to 300 characters and then using them as commands when remoting to the custom message queue server. SQL Server was running on 16 CPU x64 server with 16 GB or RAM.

    Because of the way .net garbage collector works and intensive processing of strings, which are as we all know immutable and every modification of the string essentially creates the new string, .net bit of SQL server was running out of memory every so often, freezing the whole SQL instance for up to 2 minutes.

  • Ok everyone... pay very close attention... I'm going to say it again until you get it...

    [font="Arial Black"]CLR's are for people who don't know T-SQL[/font]

    First, Matt... thank you, Sir, for providing the CLR test. Absolutely none of my current piss off about people insisting that CLR's are actually useful is directed at you. You've been most helpful in testing that which I could not. Thank you again.

    On to war... 😉

    As a review... Matt's stats (I'm a poet and don't know it :Whistling: ) from his fine CLR experiment are as follows; notice the duration for the "Regex Replace" which is the result of the CLR used...

    results:

    Count: 1,033,762

    Regular count 6,483 (6s - as you'd expect)

    Regex replace 91,486 (1 min. 31.486 s)

    dereks solution 2,512,687 (41 min.52.687 s)

    Improvement 27 times faster

    Now, there are some minor differences between the system tables in SQL Server 2k5 (which Matt used) and SQL Sever 2k, but I've replicated his test table and run a test on 5.5 Million rows just like he did. The difference is, I used my Tally Table driven function... as a reminder, here's the code for that function...

    CREATE FUNCTION fnDeleteNumbers (@String VARCHAR(8000))

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @Return VARCHAR(8000)

    SET @Return = ''

    SELECT @Return = @Return+SUBSTRING(@String,N,1)

    FROM dbo.Tally WITH (NOLOCK)

    WHERE N <= LEN(@String)

    AND SUBSTRING(@String,N,1) LIKE '%[^0-9]%'

    RETURN @Return

    END

    ...and here's the test code I used... builds the same size table with the same columns, the same data types, and the same index...

    --===== Setup the environment

    SET NOCOUNT ON

    IF OBJECT_ID('dbo.tblTest','U') IS NOT NULL

    DROP TABLE dbo.tblTest

    --===== Declare a couple of local variables like Matt did

    DECLARE @T VARCHAR(100)

    DECLARE @G DATETIME

    --===== Create the test table (using SQL Server 2000)

    SET @G = GETDATE()

    SELECT TOP 5500000

    IDENTITY(BIGINT,1,1) AS Num,

    NEWID() AS GID

    INTO dbo.TblTest

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    SELECT CONVERT(CHAR(12),GETDATE()-@G,114) + ' Time to create test table (hh:mi:ss:mmm)'

    --===== Create the same index as Matt did

    SET @G = GETDATE()

    CREATE UNIQUE CLUSTERED INDEX TblTest_Clust ON TblTest(GID)

    SELECT CONVERT(CHAR(12),GETDATE()-@G,114) + ' Time to create clustered index (hh:mi:ss:mmm)'

    --===== As Matt said, "Get a count of the victims"

    SET @G = GETDATE()

    SELECT COUNT(*) AS [Count of "Victims"] FROM TblTest

    WHERE CAST(GID AS VARCHAR(100)) BETWEEN 'A' AND 'D'

    SELECT CONVERT(CHAR(12),GETDATE()-@G,114) + ' Time to create clustered index (hh:mi:ss:mmm)'

    --===== Start the duration timer and do the test...

    -- As Matt said, "don't DISPLAY the update - just cycle through them"

    SET @G = GETDATE()

    SELECT @T = dbo.fnDeleteNumbers(GID)

    FROM TblTest WHERE CAST(GID AS VARCHAR(100)) BETWEEN 'A' AND 'D'

    SELECT CONVERT(CHAR(12),GETDATE()-@G,114) + ' Time to Tally Table replace (hh:mi:ss:mmm)'

    Now... this is where you really need to pay attention... here's the results from the code run above... pay really close attention to the RED text and compare it to the duration for Matt's "Regex Replace" time from the beginning of this message...

    -----------------------------------------------------

    00:00:21:203 Time to create test table (hh:mi:ss:mmm)

    ----------------------------------------------------------

    00:00:54:267 Time to create clustered index (hh:mi:ss:mmm)

    Count of "Victims"

    ------------------

    1030347

    ----------------------------------------------------------

    00:00:07:233 Time to create clustered index (hh:mi:ss:mmm)

    -------------------------------------------------------

    [font="Arial Black"]00:01:29:093 Time to Tally Table replace (hh:mi:ss:mmm)[/font]

    In case you can't figure it out... the T-SQL/Tally Table solution beat the CLR solution by almost 2 and a half seconds!!!!

    So, let me say it one more time just in case it hasn't sunk in yet...

    [font="Arial Black"]CLR's are for people who don't know T-SQL[/font] 😛 😀 😉

    And, allow me to introduce my new "tag line":

    "Before you think outside the box, consider how good the box you're in is!" :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • so...

    Your saying CLR's are for people who know sql, or people who don't know sql...I'm confused. 😉

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Max Yasnytskyy (10/18/2007)

    Because of the way .net garbage collector works and intensive processing of strings, which are as we all know immutable and every modification of the string essentially creates the new string, .net bit of SQL server was running out of memory every so often, freezing the whole SQL instance for up to 2 minutes.

    I figured that was the general direction you were heading in. A lot of that's improved with the newer updates to the framework. Certainly after them and SQL2005 SP2 - seems to not happen a lot.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 31 through 45 (of 172 total)

You must be logged in to reply to this topic. Login to reply