how to avoid bad character in sql server

  • I have a question about SQL Server.

    Table Emp:

    Id | Name

    1 |test“te d'abc

    2 |anr¤a

    3 | gs-re-C“te d'ab

    4 |M‚fe, DF

    5 |R™temd

    6 |™jad”ji

    7 |Cje y ret¢n

    8 |J™kl™balu

    9 |le“ne-iokd

    10 |liode-Pyr‚n‚ie

    11 |V„s G”ta

    12 |Sƒo Paulo

    13 |vAstra gAtaland

    14 |¥uble / Bio-Bio

    15 |U“pl™n/ds VAsb-y

    I need to replace the special characters with empty values.

    Based on the above table, I want to output something like below. I tried like this:

    select

    REPLACE(state, SUBSTRING(state, PATINDEX('%[^"a-z"]%', state), 1), '') as name,

    id

    from emp

    This query doesn't return the expected result.

    Output is:

    Id | Name

    1 |testtedabc

    2 |anra

    3 |gsreCedab

    4 |MfeDF

    5 |Rtemd

    6 |jadji

    7 |Cjeyretn

    8 |Jkbalu

    9 |eneiokd

    10 |iodePyrnie

    11 |VsGta

    12 |SoPaulo

    13 |vAstragAtaand

    14 |ubleBioBio

    15 |UpndsVAsby

    Please tell me how to write a query to achieve this task in SQL Server.

  • Why is the output wrong. What is the expected output?

    -- Gianluca Sartori

  • we need replace bad character with empty string and expected output like below.

    id | Name

    1 |testtedabc

    2 |anra

    3 |gsreCedab

    4 |MfeDF

    5 |Rtemd

    6 |jadji

    7 |Cjeyretn

    8 |Jkbalu

    9 |eneiokd

    10 |iodePyrnie

    11 |VsGta

    12 |SoPaulo

    13 |vAstragAtaand

    14 |ubleBioBio

    15 |UpndsVAsby

    Please tell me how to write a query to achieve this task in SQL Server.

  • This should do the trick:

    DECLARE @sampleData TABLE (

    Id int,

    Name nvarchar(50)

    )

    INSERT INTO @sampleData VALUES

    (1 ,'test“te d''abc')

    ,(2 ,'anr¤a')

    ,(3 ,' gs-re-C“te d''ab')

    ,(4 ,'M‚fe, DF')

    ,(5 ,'R™temd')

    ,(6 ,'™jad”ji')

    ,(7 ,'Cje y ret¢n')

    ,(8 ,'J™kl™balu')

    ,(9 ,'le“ne-iokd')

    ,(10 ,'liode-Pyr‚n‚ie')

    ,(11 ,'V„s G”ta')

    ,(12 ,'Sƒo Paulo')

    ,(13 ,'vAstra gAtaland')

    ,(14 ,'¥uble / Bio-Bio')

    ,(15 ,'U“pl™n/ds VAsb-y');

    SELECT *

    FROM @sampleData AS s

    CROSS APPLY (

    SELECT Name = (

    SELECT SUBSTRING(s.name, number, 1) AS c

    FROM master.dbo.spt_values

    WHERE type = 'P'

    AND number > 0

    AND (

    ASCII(SUBSTRING(s.name, number, 1)) BETWEEN 97 AND 122

    OR

    ASCII(SUBSTRING(s.name, number, 1)) BETWEEN 65 AND 90

    )

    ORDER BY number

    FOR XML PATH(''), TYPE

    ).value('.','nvarchar(50)')

    ) AS t

    -- Gianluca Sartori

  • If this is something you need to do frequently, you can create an inline table-valued function to do the cleaning for you. Here's one:

    CREATE FUNCTION dbo.CleanAlphaNumeric(@OriginalText Varchar(8000)) RETURNS TABLE WITH SCHEMABINDING

    AS

    RETURN

    WITH

    E1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) dt(n)),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    Tally(N) AS (

    SELECT TOP (ISNULL(DATALENGTH(@OriginalText), 0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteCleaned(CleanText) AS (

    SELECT (SELECT SUBSTRING(@OriginalText, t.N, 1)

    FROM Tally t

    WHERE ASCII(SUBSTRING(@OriginalText, t.N, 1)) BETWEEN 48 AND 57 --0-9

    OR ASCII(SUBSTRING(@OriginalText, t.N, 1)) BETWEEN 65 AND 90 --A-Z

    OR ASCII(SUBSTRING(@OriginalText, t.N, 1)) BETWEEN 97 AND 122 --a-z

    OR ASCII(SUBSTRING(@OriginalText, t.N, 1)) = 32 --space

    FOR XML PATH(''), TYPE).value('.','varchar(8000)')

    )

    SELECT CleanText

    FROM cteCleaned;

    Then all you have to do is use the function to clean the undesirable characters from your column. For more information on how a tally table works, see http://www.sqlservercentral.com/articles/T-SQL/62867/.

  • Hi except one case everything is working fine.if numbers avilable in state column then we need numbers also in the output.

    example: State column have value : abc123&~ then we need output is abc123.

    I added extract code ASCII(SUBSTRING(s.state, number, 1)) BETWEEN 48 AND 57 in the above query but numbers are missing. if available in state then number required in the output.please tell me where we need modify in the provide code.

  • SELECT *

    FROM @sampleData AS s

    CROSS APPLY (

    SELECT Name = (

    SELECT SUBSTRING(s.name, number, 1) AS c

    FROM master.dbo.spt_values

    WHERE type = 'P'

    AND number > 0

    AND (

    ASCII(SUBSTRING(s.name, number, 1)) BETWEEN 97 AND 122

    OR

    ASCII(SUBSTRING(s.name, number, 1)) BETWEEN 65 AND 90

    OR

    ASCII(SUBSTRING(s.name, number, 1)) BETWEEN 48 AND 57

    )

    ORDER BY number

    FOR XML PATH(''), TYPE

    ).value('.','nvarchar(50)')

    ) AS t

    You can do the same thing on Ed's excellent answer.

    -- Gianluca Sartori

  • I prefer to use a nice pattern instead of learning ASCII values. But that's just me. 😀

    SELECT *

    FROM @sampleData AS s

    CROSS APPLY (

    SELECT Name = (

    SELECT SUBSTRING(s.name, number, 1) AS c

    FROM master.dbo.spt_values

    WHERE type = 'P'

    AND number > 0

    AND SUBSTRING(s.name, number, 1) LIKE '[a-zA-Z0-9 ]' COLLATE Latin1_General_Bin

    ORDER BY number

    FOR XML PATH(''), TYPE

    ).value('.','nvarchar(50)')

    ) AS t

    EDIT: Added collation

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (3/18/2016)


    I prefer to use a nice pattern instead of learning ASCII values. But that's just me. 😀

    Me too, but for some reason it didn't catch all the "bad" characters in the test data, so I decided to use ASCII.

    Apparently, SQL Server thinks that "™" matches the [A-Z] pattern.

    -- Gianluca Sartori

  • spaghettidba (3/18/2016)


    Luis Cazares (3/18/2016)


    I prefer to use a nice pattern instead of learning ASCII values. But that's just me. 😀

    Me too, but for some reason it didn't catch all the "bad" characters in the test data, so I decided to use ASCII.

    Apparently, SQL Server thinks that "™" matches the [A-Z] pattern.

    I forgot to include the collation. Using a binary collation won't have that problem. I fixed my previous code.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply