June 21, 2012 at 11:00 pm
Experts,
I am trying to replace all special characters in a column with one special character.
Example:
Table: dbo.Employee
Column: Name
Name
-------
edwardneuman!"<]
mikemoreno)'>$:
JeffJensen"?>"
I am trying to get the namepart to the left of ANY special character. To achieve this, I am thinking of replacing all the special characters with a single special character so that I can find the first occurrence of that special character and grab left of the special character (SUBSTRING/CHARINDEX). This way I don't need to loop through all the special characters. Could you guys provide any suggestions?
I am expecting the following results:
Name
-------
edwardneuman<<<<
mikemoreno<<<<<
JeffJensen<<<<
Thanks in advance.
June 21, 2012 at 11:17 pm
You can take a look at this script: http://www.sqlservercentral.com/scripts/Identify+illegal+characters/89934/
It identifies "bad" characters in a column of data - you'll need to define what's "good" (the script assumes alphanumerics are good).
Using that you can:
1) insert the results into a temporary table
2) use that to do build and execute a Dynamic SQL REPLACE on all of the bad characters.
There was a thread on this several months back where GSquared and I proposed two approaches to this but I can't find it now. Both solutions would handle 1,000,000 rows in under about 8 seconds. GSquared's approach used a tally table to parse the string if I recall correctly.
If you think this will work for you and you need help taking the script through the next steps, let me know.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 21, 2012 at 11:41 pm
Thank you for your response Dwain.
Can I include the characters . - also in allowed character list? If yes, how would I do that?
I tried this, but didn't work.
SET @allowed = '%[^a-zA-Z0-9.-]%' -- Pattern to allow
Suppose if the Data is like below:
EdwardNeumann(">eneum
I would like to get only :
EdwardNeumann
Once I hit any special character, I want to stop and grab text left to that special character.
June 21, 2012 at 11:48 pm
That might actually be a little easier (grabbing text to left of first invalid character).
Unfortunately, I'm not an expert on PATINDEX. And an initial search didn't turn up a tutorial chock full of examples that might guide you.
I've asked GSquared for help. Maybe he'll come through with his Tally table solution which was faster anyway and may actually be more applicable to this particular case.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 22, 2012 at 12:01 am
Thanks Dwain. I look forward for GSquared's help and dig more into PATINDEX. I will update here if I find any solution.
June 22, 2012 at 1:27 am
Is this what you're looking for?
--Sample data
SELECT Name
INTO Employees
FROM (VALUES('edwardneuman!"<]'),
('mikemoreno)''>$:'),
('JeffJensen"?>"'),
('Good'))a(Name);
--Solution
SELECT Name AS originalName,
SUBSTRING(Name,1,ISNULL(NULLIF(PATINDEX('%[^A-Za-z0-9]%',Name),0)-1,LEN(Name))) AS noSpecials
FROM Employees;
Returns: -
originalName noSpecials
---------------- ----------------
edwardneuman!"<] edwardneuman
mikemoreno)'>$: mikemoreno
JeffJensen"?>" JeffJensen
Good Good
Or am I missing some nuance?
June 22, 2012 at 10:19 am
Thank you for your reply Cadavre. But, I am looking for something different.
I want only the characters to th left of the first occurance of any list of special characters (without looping through all the special characters one by one). But I don't want to consider . and - as my list of special characters to remove.
Example:
Current data
----------------
EdwardNuemann^"/EdNe:;23Ef
hotmail.com".])jeff<
com-cast.net>"'}myact33t
Expected result
---------------
EdwardNuemann
hotmail.com
com-cast.net
June 22, 2012 at 10:41 am
chaseurpuli (6/22/2012)
Thank you for your reply Cadavre. But, I am looking for something different.I want only the characters to th left of the first occurance of any list of special characters (without looping through all the special characters one by one). But I don't want to consider . and - as my list of special characters to remove.
Example:
Current data
----------------
EdwardNuemann^"/EdNe:;23Ef
hotmail.com".])jeff<
com-cast.net>"'}myact33t
Expected result
---------------
EdwardNuemann
hotmail.com
com-cast.net
Just add the characters into the pattern for the PATINDEX.
--Solution
SELECT Name AS originalName,
SUBSTRING(Name,1,ISNULL(NULLIF(PATINDEX('%[^A-Za-z0-9.-]%',Name),0)-1,LEN(Name))) AS noSpecials
FROM Employees;
June 22, 2012 at 11:33 am
Excellent Cadavre! I had to tweak around on my allowed and non-allowed characters. But, It gave me the result that I expected. Thanks much.
August 14, 2014 at 11:21 am
Ok. Will this work even with Line Breaks, I mean, with Char(10) ?? Will it exclude the freaking Line Break?
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply