December 18, 2012 at 4:22 am
SELECT accountid,New_Column = REPLACE(telephone1,SUBSTRING(telephone1,PATINDEX('[^0-9]',telephone1),1),'')
INTO #YourNewResults
FROM #TelephoneTable
I have the script above which I thought would remove all non-numerical characters from the field but this not the case?
I have some fields with numbers+characters and some with characters only regardless of the combination I want all characters removed.
I have seen there are a lot of functions is this the best way? If so where would the function be applied in the script?
December 18, 2012 at 6:48 am
Hi,
The best way in my opinion would be to use Regex and some CLR functions to replace the characters. Read this article:
http://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/[/url]
Once you've got the functions installed then it's as simple as:
SELECT dbo.RegExReplace('ABC123ABC','[A-Z]','')
December 18, 2012 at 7:08 am
@Sachin,
I agree that the CLR Regex method should be the fastest although I've not seen any performance testing on it.
The questions I have for you is...
1. Can you use CLR or do you need a 100% T-SQL solution?
2. Is the maximum length of the data <= 8K bytes?
I no longer believe this. In the last 2 weeks, I've seen CLR Regex get its doors blown off by more than one example of some good, ol' fashioned T-SQL. CLR is still the fastest for splitting a string and it might be the fastest for cleaning a string, but CLR Regex probably won't be. Regex itself seems to have been over promised and under delivered. It appears that you pay in performance what you thought you gained in flexibility.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 18, 2012 at 7:16 am
1. I would prefer t-SQL as CLR is way beyond my understanding
2. At the moment I'm not concerned about performance
December 18, 2012 at 7:35 am
Here's one way to do it in T-SQL:
DECLARE @String VARCHAR(8000) = 'ABC12D34E56';
WITH Seeds(Seed)
AS (SELECT *
FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS V (C)),
Numbers(Number)
AS (SELECT TOP (8000)
ROW_NUMBER() OVER (ORDER BY S1.Seed)
FROM Seeds AS S1
CROSS JOIN Seeds AS S2
CROSS JOIN Seeds AS S3
CROSS JOIN Seeds AS S4)
SELECT (
SELECT SUBSTRING(@String, Number, 1)
FROM Numbers
WHERE Number <= LEN(@String)
AND SUBSTRING(@String, Number, 1) LIKE '[0-9]'
ORDER BY Number
FOR XML PATH(''),
TYPE).value('.[1]', 'VARCHAR(8000)');
Alternatively, if you need to apply this to a table instead of a variable:
IF OBJECT_ID(N'tempdb..#T') IS NOT NULL
DROP TABLE #T;
CREATE TABLE #T
(ID INT IDENTITY
PRIMARY KEY,
Col1 VARCHAR(8000));
INSERT INTO #T
(Col1)
VALUES ('123A'),
('B1C2D3');
WITH Seeds(Seed)
AS (SELECT *
FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS V (C)),
Numbers(Number)
AS (SELECT TOP (8000)
ROW_NUMBER() OVER (ORDER BY S1.Seed)
FROM Seeds AS S1
CROSS JOIN Seeds AS S2
CROSS JOIN Seeds AS S3
CROSS JOIN Seeds AS S4)
SELECT *
FROM #T AS T
CROSS APPLY (SELECT (
SELECT SUBSTRING(Col1, Number, 1)
FROM Numbers
WHERE Number <= LEN(Col1)
AND SUBSTRING(Col1, Number, 1) LIKE '[0-9]'
ORDER BY Number
FOR XML PATH(''),
TYPE).value('.[1]', 'VARCHAR(8000)') AS Stripped) AS Parser;
In either case, it uses a "Runtime Numbers Table" to parse the string into individual characters, then strips out anything other than digits 0-9, then uses For XML to put it all back together. (FOR XML PATH with a zero-length root indicated by (''), and no column name for the query, will concatenate strings together nicely. It's a documented trick that comes in very handy.)
Does that help?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
December 24, 2012 at 6:39 pm
This can also be done using the PatternSplitCM function described in the fourth article in my signature line (Splitting Strings Based on Patterns).
IF OBJECT_ID(N'tempdb..#T') IS NOT NULL
DROP TABLE #T;
CREATE TABLE #T
(ID INT IDENTITY
PRIMARY KEY,
Col1 VARCHAR(8000));
INSERT INTO #T
(Col1)
VALUES ('123A'),
('B1C2D3');
;WITH CTE AS (
SELECT *
FROM #T a
CROSS APPLY PatternSplitCM(a.Col1, '[0-9]')
WHERE [Matched] = 1)
SELECT ID, Col1=MAX(Col1), Col2=(
SELECT '' + Item
FROM CTE b
WHERE a.ID = b.ID
ORDER BY ItemNumber
FOR XML PATH(''))
FROM CTE a
GROUP BY ID
IF OBJECT_ID(N'tempdb..#T') IS NOT NULL
DROP TABLE #T;
In fact the inspiration for that article (described therein) was a forum-posted question that was quite similar to this one.
One caveat though. Since you're working in SQL 2005, you'll need to replace the numbers table with a Ben-Gan style Tally table something like this one (from the PatternSplitQU FUNCTION also in that article):
WITH Nbrs_3(n) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1)
,Nbrs_2 (n) AS (SELECT 1 FROM Nbrs_3 n1 CROSS JOIN Nbrs_3 n2)
,Nbrs_1 (n) AS (SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2)
,Nbrs_0 (n) AS (SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2)
,Tally (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY n) As n FROM Nbrs_0)
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 24, 2012 at 6:46 pm
GSquared (12/18/2012)
Here's one way to do it in T-SQL:
DECLARE @String VARCHAR(8000) = 'ABC12D34E56';
WITH Seeds(Seed)
AS (SELECT *
FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS V (C)),
Numbers(Number)
AS (SELECT TOP (8000)
ROW_NUMBER() OVER (ORDER BY S1.Seed)
FROM Seeds AS S1
CROSS JOIN Seeds AS S2
CROSS JOIN Seeds AS S3
CROSS JOIN Seeds AS S4)
SELECT (
SELECT SUBSTRING(@String, Number, 1)
FROM Numbers
WHERE Number <= LEN(@String)
AND SUBSTRING(@String, Number, 1) LIKE '[0-9]'
ORDER BY Number
FOR XML PATH(''),
TYPE).value('.[1]', 'VARCHAR(8000)');
Alternatively, if you need to apply this to a table instead of a variable:
IF OBJECT_ID(N'tempdb..#T') IS NOT NULL
DROP TABLE #T;
CREATE TABLE #T
(ID INT IDENTITY
PRIMARY KEY,
Col1 VARCHAR(8000));
INSERT INTO #T
(Col1)
VALUES ('123A'),
('B1C2D3');
WITH Seeds(Seed)
AS (SELECT *
FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS V (C)),
Numbers(Number)
AS (SELECT TOP (8000)
ROW_NUMBER() OVER (ORDER BY S1.Seed)
FROM Seeds AS S1
CROSS JOIN Seeds AS S2
CROSS JOIN Seeds AS S3
CROSS JOIN Seeds AS S4)
SELECT *
FROM #T AS T
CROSS APPLY (SELECT (
SELECT SUBSTRING(Col1, Number, 1)
FROM Numbers
WHERE Number <= LEN(Col1)
AND SUBSTRING(Col1, Number, 1) LIKE '[0-9]'
ORDER BY Number
FOR XML PATH(''),
TYPE).value('.[1]', 'VARCHAR(8000)') AS Stripped) AS Parser;
In either case, it uses a "Runtime Numbers Table" to parse the string into individual characters, then strips out anything other than digits 0-9, then uses For XML to put it all back together. (FOR XML PATH with a zero-length root indicated by (''), and no column name for the query, will concatenate strings together nicely. It's a documented trick that comes in very handy.)
Does that help?
Gus - I think the OP may have a problem with your Tally table (Seeds CTE) for the same reason I noted above.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 25, 2012 at 8:58 pm
Jeff Moden (12/18/2012)
@Sachin,I agree that the CLR Regex method should be the fastest although I've not seen any performance testing on it.
The questions I have for you is...
1. Can you use CLR or do you need a 100% T-SQL solution?
2. Is the maximum length of the data <= 8K bytes?
Actually, I have to take that back. CLR would be the fastest method but only if you DON"T use Regex. I've recently seen many performance tests and either some dedicated CLR or some dedicated T-SQL will usually smoke RegEx according to the tests I've seen.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 25, 2012 at 9:01 pm
Sachin 80451 (12/18/2012)
1. I would prefer t-SQL as CLR is way beyond my understanding2. At the moment I'm not concerned about performance
I whole heartedly agree with #1 above. I never agree with #2 above because they always come back about the performance problem that a given solution is having. The only thing you should be more concerned about than performance is accuracy and the two should be virtually tied as being the most important. If you think not, spend an hour or two looking at the thousands of performance issues people are asking about on this forum alone.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 25, 2012 at 9:08 pm
Jeff Moden (12/25/2012)
Jeff Moden (12/18/2012)
@Sachin,I agree that the CLR Regex method should be the fastest although I've not seen any performance testing on it.
The questions I have for you is...
1. Can you use CLR or do you need a 100% T-SQL solution?
2. Is the maximum length of the data <= 8K bytes?
Actually, I have to take that back. CLR would be the fastest method but only if you DON"T use Regex. I've recently seen many performance tests and either some dedicated CLR or some dedicated T-SQL will usually smoke RegEx according to the tests I've seen.
I've got your back Jeff.
Here's one example: http://www.sqlservercentral.com/Forums/Topic1390297-3122-5.aspx
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 3, 2013 at 4:05 am
I have no experience in CRL, so would like T-SQL for now please?
the answer to the second question is yes.
January 4, 2013 at 10:03 am
Sorry to be thick-headed (read: slow-witted) but I have to ask: why did you SELECT TOP (8000) ROW_NUMBER... when it [seems] to work with TOP (10). Since we're using only the numbers 0-9, I would have figured using a generic "number generator" like SELECTING ROW_NUMBER() ... from a "known" source that would have at least 10 records (like sys.objects), you could skinny down the code and maybe processing as well:
DECLARE @String VARCHAR(8000) = 'ABC12D34E56';
SELECT (
SELECT SUBSTRING(@String, n.[Number], 1)
FROM (
SELECT TOP (10) ROW_NUMBER()
OVER (ORDER BY [object_id]) AS [Number]
FROM sys.objects) n
WHERE n.[Number] <= LEN(@String)
AND SUBSTRING(@String, n.[Number], 1) LIKE '[0-9]'
ORDER BY n.[Number]
FOR XML PATH(''), TYPE).[value]('.[1]', 'VARCHAR(8000)');
Factoring in inflation, health care, taxes and cost of living, my two cents isn't worth much so I apologize if I missed something. I'll blame the lack of coffee for any oversight on my part.
Cheers,
Ken
January 4, 2013 at 10:58 am
Your Name Here (1/4/2013)
@GSquared:Sorry to be thick-headed (read: slow-witted) but I have to ask: why did you SELECT TOP (8000) ROW_NUMBER... when it [seems] to work with TOP (10). Since we're using only the numbers 0-9, I would have figured using a generic "number generator" like SELECTING ROW_NUMBER() ... from a "known" source that would have at least 10 records (like sys.objects), you could skinny down the code and maybe processing as well:
DECLARE @String VARCHAR(8000) = 'ABC12D34E56';
SELECT (
SELECT SUBSTRING(@String, n.[Number], 1)
FROM (
SELECT TOP (10) ROW_NUMBER()
OVER (ORDER BY [object_id]) AS [Number]
FROM sys.objects) n
WHERE n.[Number] <= LEN(@String)
AND SUBSTRING(@String, n.[Number], 1) LIKE '[0-9]'
ORDER BY n.[Number]
FOR XML PATH(''), TYPE).[value]('.[1]', 'VARCHAR(8000)');
Factoring in inflation, health care, taxes and cost of living, my two cents isn't worth much so I apologize if I missed something. I'll blame the lack of coffee for any oversight on my part.
Cheers,
Ken
Try it on a much larger string and see why the 8000 becomes important.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 4, 2013 at 11:45 am
:exclamationmark:
I get it now. Thanks!
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply