Please Help with Strings

  • Hi, I was asked to help a newbie on a wierd request.  There is a column in a table having characters like letters, numbers, brackets, single quotes, double quotes, etc.  He wants to remove all characters but letters only.  Can we build a query removing characters not like letters from a to z?  In other words, keep letters and delete other characters?  I am a pure SQL server admin with some light T-SQL skills so this looks like a big problem for me. 

     

    Thanks a lot.

  • Try this:

    SELECT TOP 255

    Identity (int, 1,1) as i

    INTO #Numbers

    from syscolumns C1, syscolumns C2

    declare @Chars table (ToRemove int)

    INSERT INTO @Chars (ToRemove)

    select i from #NUMBERS

    where CHAR(i) not like '[A-Z]' and CHAR(i) not like '[0-9]'

    declare @String as nvarchar(50)

    select @String = 'Initial , $%&^*

     String - '

    select @String = replace (@String, char(ToRemove), '')

    FROM @Chars

    select @String

    drop table #Numbers

     

    _____________
    Code for TallyGenerator

  • thanks a lot.

  • Funny, I just posted a similar solution to remove everything except 0-9...

    If you don't already have a "Tally" table, now's the time to make one... here's the code to make a Tally table...

    --===== Create and populate a Tally table on the fly

     SELECT TOP 9999

            IDENTITY(INT,1,1) AS N

       INTO dbo.Tally

       FROM Master.dbo.SysComments sc1,

            Master.dbo.SysComments sc2

    --===== Give it a primary key

      ALTER TABLE dbo.Tally

            ADD CONSTRAINT PK_Tally_N

                PRIMARY KEY CLUSTERED (N)

    --===== Give everyone "SELECT" rights

      GRANT SELECT ON dbo.Tally TO PUBLIC

    ...and that makes the solution to your problem both easy and high speed...

     CREATE FUNCTION dbo.StripChars( @String VARCHAR(32))

    RETURNS VARCHAR(32)

         AS

      BEGIN

            DECLARE @Result VARCHAR(32)

                SET @Result = ''

             SELECT @Result = @Result + SUBSTRING(@String,N,1)

               FROM dbo.Tally

              WHERE N<=LEN(@String)

                AND SUBSTRING(@String,N,1) LIKE '[A-Z]'

     RETURN (NULLIF(@Result,''))

        END

    ... It will resolve 100,000 records in about 6 seconds on my humble 1.8 Ghz desktop machine.

    Here's the usage example...

    SELECT dbo.StripChars(somecolname)

      FROM dbo.sometablename

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I use the function listed below without any added tables. So I created the given solution above and compared execution plans: they came out the same.

    ALTER FUNCTION dbo.udf_strip_non_alphabetic ( @IN_STRING VARCHAR(8000) )

    RETURNS VARCHAR(8000)

    AS 

    BEGIN

     -- If string is null, return empty

     IF ISNULL( @IN_STRING, '' ) = ''

     BEGIN

      RETURN('')

     END

     

     DECLARE @OUT_STRING AS VARCHAR( 2000 )

     

     SET @OUT_STRING = ''

     

     DECLARE @currentChar CHAR

     DECLARE @tempInt INT

     

     SET @currentChar = Substring(@IN_STRING, 1, 1)

     

     -- While the input string has length.

     WHILE ( Len( @IN_STRING ) > 0 )

     BEGIN

      -- Get the decimal value of the ASCII character, then compare

      SET @tempInt = ASCII( @currentChar )

      

      IF(@tempInt between 65 and 90

       OR @tempInt between 97 and 122)

      BEGIN

       SET @OUT_STRING = @OUT_STRING + @currentChar

      END

     

      -- Overwrite what's left of the input string, slicing off the left character

      SET @IN_STRING = substring( @IN_STRING, 2, Len( @IN_STRING ) )

      

      -- Get the next character

      SET @currentChar = substring( @IN_STRING, 1, 1 )

     

     END

     

     RETURN( @OUT_STRING )

    END

  • you guys are great.  thx.

  • Ya just gotta love what Microsoft did with their execution plans ... I absolutely agree that both execution plans came out the same for the following...

    SELECT dbo.udf_strip_non_alphabetic (ToStrip)

      FROM dbo.AlphaTest

    SELECT dbo.StripChars (ToStrip) --This one is twice as fast even though plan doesn't say so

      FROM dbo.AlphaTest

    It even says they'll both take the same amount of time (50%/50%).  But ... the first one takes 13 seconds to return the stripped 100,000 k rows from my test table and the second on takes only 6.  Doesn't sound like a lot in seconds but the second query is more than twice as fast.

    AND, If you run the execution plans (simultaneously) on the code within each function, it shows the second query loosing big time (.02% to 99.98%)!  Still, the second query runs twice as fast...  Microsoft will tell us it's not a fault... it's a feature

    Anyway, this isn't to badmouth anyone's code... it's to show 2 things... the power of a Tally table (should make one permanent) and to demonstrate that, once again, Microsoft generated execution (or is that "excretion" ) plans lie like a rug .  AND, if you bring up the performance tab of Windows Task Manager to watch the cpu usage (set refresh to high speed), they both peg the cpu at 100% for the duration of their run.  Dunno about you but if I gotta peg a cpu, I'd rather do it for 6 seconds than 13. 

    Just for grins, I ran both on a million rows of data (change 100,000 to a million in the code below)... the first query took 140 seconds and the second query only took 56.  The cpu on my desktop box was pegged for both runs (I gotta hook a garden hose up to that poor baby ).

    So, my recommendation is that execution plans are good as an "estimate" but nothing beats a test... like my ol' physics professor used to say, "One measurement is worth a thousand calculations. 

    Now, don't take my word for it  ... here's the code I used to build the test table... do your own testing and give us a jingle back... I'm always curious how these things run on different boxes so I'd really appreciate your feedback...

    --===== Create and populate a test table by using a cross join

     SELECT TOP 100000

            IDENTITY(INT,1,1) AS ID,

            CAST('AFGBS004SA55' AS VARCHAR(32)) AS ToStrip

       INTO dbo.AlphaTest

       FROM Master.dbo.SysComments sc1,

            Master.dbo.SysComments sc2

    --===== Add a primary key

      ALTER TABLE [dbo].[AlphaTest]

            ADD PRIMARY KEY CLUSTERED ([ID])

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply