Removing High Ascii (>128)

  • In text fields I have high ascii (>128) mainly as a result of users cut and pasting from MS-WORD chars like "smart quotes" ascii 143 & 146 i think.

    These are causing me trouble downstream in xml processes and I need to remove them.

    Can anyone suggest an update statement to remove them from anywhere in a field ?

    Or can I solve this by changing the collation ?

     

  • (n)textfields or (var)char fields?

    If varchar you can write an user defined function

    Either with the function ASCII (returns int value of the first character)

    or with calling REPLACE  x times.

     

  • Actually, I believe the smart quotes are CHAR(147) and CHAR(148)... run the following in Query Analyzer and see if you agree...

    PRINT CHAR(147)

    PRINT CHAR(148)

    PRINT CHAR(143)

    PRINT CHAR(146)

    PRINT ASCII('“') --Left smart quote from Word

    PRINT ASCII('”') --Right smart quote from Word

    PRINT ASCII('"') --Regular double-quote

    If you want to just remove the smart quotes altogether, do this...

    UPDATE yourtablename

       SET thecolumn = REPLACE(REPLACE(thecolumn,CHAR(147),''),CHAR(148),'')

    ...that will simply delete the smart quotes wherever they occur.

    If what you really want to do is replace the smart quotes with regular double-quotes, then do this...

    UPDATE yourtablename

       SET thecolumn = REPLACE(REPLACE(thecolumn,CHAR(147),CHAR(34)),CHAR(148),CHAR(34))

    Of course, unless you've memorized the ANSI character set, that's not exactly self-documenting... I'd be more tempted to use something like this...

    --===== Replace smart quotes with nothing (deletes smart quotes)

     UPDATE yourtablename

        SET thecolumn = REPLACE(REPLACE(thecolumn,'“',''),'”','')

    ...or...

    --===== Replace smart quotes with regular double quotes

     UPDATE yourtablename

        SET thecolumn = REPLACE(REPLACE(thecolumn,'“','"'),'”','"')

    If you need to delete all characters the have an ASCII value of 128 or more, then we'll likely need a function... that'll have to be in a different post.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for that but I am really after a generic solution to remove any char >128, from anywhere in a field or record.  Smart quotes are just one example, I didn't want to do 127 different replace calls..    Any function ideas gratefully recieved.

    Could altering the collation on that table be a solution ?  Is there a collation which would remove the high ascii ?

     

     

  • A quick-and-dirty method would be:

    declare @i int

    set @i = 128

    while @i < 256

    begin

    --print cast(@i as varchar) + ' ' + CHAR(@i)

    UPDATE yourtablename

    SET thecolumn = REPLACE(thecolumn,CHAR(@i),'')

    set @i = @i + 1

    end

  • If the fields are varchar then

    create the numbers table thus

    CREATE TABLE [numbers] (number int PRIMARY KEY CLUSTERED)

    and populate with numbers from 1 to 8000

    and use

    DECLARE @result varchar(8000)

    SET @result = ''

    SELECT @result = @result + SUBSTRING(@var,n.number,1)

    FROM [numbers] n

    WHERE n.number <= LEN(@var)

    AND ASCII(SUBSTRING(@var,n.number,1)) < 128

    ORDER BY n.number

    SELECT @result

    You could put this in a function as well

    Far away is close at hand in the images of elsewhere.
    Anon.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply