Replace multiple characters with single character in a column

  • Experts,

    I am trying to replace all special characters in a column with one special character.

    Example:

    Table: dbo.Employee

    Column: Name

    Name

    -------

    edwardneuman!"<]

    mikemoreno)'>$:

    JeffJensen"?>"

    I am trying to get the namepart to the left of ANY special character. To achieve this, I am thinking of replacing all the special characters with a single special character so that I can find the first occurrence of that special character and grab left of the special character (SUBSTRING/CHARINDEX). This way I don't need to loop through all the special characters. Could you guys provide any suggestions?

    I am expecting the following results:

    Name

    -------

    edwardneuman<<<<

    mikemoreno<<<<<

    JeffJensen<<<<

    Thanks in advance.

  • You can take a look at this script: http://www.sqlservercentral.com/scripts/Identify+illegal+characters/89934/

    It identifies "bad" characters in a column of data - you'll need to define what's "good" (the script assumes alphanumerics are good).

    Using that you can:

    1) insert the results into a temporary table

    2) use that to do build and execute a Dynamic SQL REPLACE on all of the bad characters.

    There was a thread on this several months back where GSquared and I proposed two approaches to this but I can't find it now. Both solutions would handle 1,000,000 rows in under about 8 seconds. GSquared's approach used a tally table to parse the string if I recall correctly.

    If you think this will work for you and you need help taking the script through the next steps, let me know.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thank you for your response Dwain.

    Can I include the characters . - also in allowed character list? If yes, how would I do that?

    I tried this, but didn't work.

    SET @allowed = '%[^a-zA-Z0-9.-]%' -- Pattern to allow

    Suppose if the Data is like below:

    EdwardNeumann(">eneum

    I would like to get only :

    EdwardNeumann

    Once I hit any special character, I want to stop and grab text left to that special character.

  • That might actually be a little easier (grabbing text to left of first invalid character).

    Unfortunately, I'm not an expert on PATINDEX. And an initial search didn't turn up a tutorial chock full of examples that might guide you.

    I've asked GSquared for help. Maybe he'll come through with his Tally table solution which was faster anyway and may actually be more applicable to this particular case.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thanks Dwain. I look forward for GSquared's help and dig more into PATINDEX. I will update here if I find any solution.

  • Is this what you're looking for?

    --Sample data

    SELECT Name

    INTO Employees

    FROM (VALUES('edwardneuman!"<]'),

    ('mikemoreno)''>$:'),

    ('JeffJensen"?>"'),

    ('Good'))a(Name);

    --Solution

    SELECT Name AS originalName,

    SUBSTRING(Name,1,ISNULL(NULLIF(PATINDEX('%[^A-Za-z0-9]%',Name),0)-1,LEN(Name))) AS noSpecials

    FROM Employees;

    Returns: -

    originalName noSpecials

    ---------------- ----------------

    edwardneuman!"<] edwardneuman

    mikemoreno)'>$: mikemoreno

    JeffJensen"?>" JeffJensen

    Good Good

    Or am I missing some nuance?


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Thank you for your reply Cadavre. But, I am looking for something different.

    I want only the characters to th left of the first occurance of any list of special characters (without looping through all the special characters one by one). But I don't want to consider . and - as my list of special characters to remove.

    Example:

    Current data

    ----------------

    EdwardNuemann^"/EdNe:;23Ef

    hotmail.com".])jeff<

    com-cast.net>"'}myact33t

    Expected result

    ---------------

    EdwardNuemann

    hotmail.com

    com-cast.net

  • chaseurpuli (6/22/2012)


    Thank you for your reply Cadavre. But, I am looking for something different.

    I want only the characters to th left of the first occurance of any list of special characters (without looping through all the special characters one by one). But I don't want to consider . and - as my list of special characters to remove.

    Example:

    Current data

    ----------------

    EdwardNuemann^"/EdNe:;23Ef

    hotmail.com".])jeff<

    com-cast.net>"'}myact33t

    Expected result

    ---------------

    EdwardNuemann

    hotmail.com

    com-cast.net

    Just add the characters into the pattern for the PATINDEX.

    --Solution

    SELECT Name AS originalName,

    SUBSTRING(Name,1,ISNULL(NULLIF(PATINDEX('%[^A-Za-z0-9.-]%',Name),0)-1,LEN(Name))) AS noSpecials

    FROM Employees;


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Excellent Cadavre! I had to tweak around on my allowed and non-allowed characters. But, It gave me the result that I expected. Thanks much.

  • Ok. Will this work even with Line Breaks, I mean, with Char(10) ?? Will it exclude the freaking Line Break?

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply