September 4, 2012 at 4:17 am
I have a single column table C
column is fname varchar(255)
I want to remove dirty data ie anything that is
not alphabetic char.
I cannot do this with a replace as I dont know what the
character is in advance.
I need to do this with looping ideally using common table expressions
Any idea?
September 4, 2012 at 4:57 am
You could use a function, built with PATINDEX
Create Function [dbo].[RemoveAlphaCharacters](@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin
While PatIndex('%[a-z]%', @Temp) > 0
Set @Temp = Stuff(@Temp, PatIndex('%[a-z]%', @Temp), 1, '')
Return @Temp
End
Using value sjhrh1049jsjf303
select dbo.removealphacharacters ('sjhrh1049jsjf303')
Returns
1049303
September 4, 2012 at 5:00 am
How do I do this for all the values in my column fname?
September 4, 2012 at 5:05 am
pass in your column name instead of the string
select dbo.removealphacharacters (fname) from SomeTableWhichContainsAColumnCalledFname
September 4, 2012 at 7:54 am
sj999 (9/4/2012)
I need to do this with looping ideally using common table expressionsAny idea?
Yes... you don't need typical looping for this. You need a high speed psuedo-cursor in the form of a CROSS JOIN with a Tally Table or a cteTally. This will also allow an all-in-one query that will allow you to build the function as an iTVF or "Inline Table Value Function" for a real boost in performance.
First, see the article at the following link for what a Tally Table is and how it can be used to replace certain types of loops like this one.
http://www.sqlservercentral.com/articles/T-SQL/62867/
Next, build a permanent Tally Table. Like this...
--===================================================================
-- Create a Tally table from 1 to 11000
--===================================================================
--===== Create and populate the Tally table on the fly.
SELECT TOP 11000
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.sys.ALL_Columns ac1
CROSS JOIN Master.sys.ALL_Columns ac2
;
--===== Add a CLUSTERED Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
;
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
;
GO
Then, create the following general purpose string cleaner (Limited to VARCHAR(8000))...
CREATE FUNCTION dbo.CleanString
(@pString VARCHAR(8000),@pKeepPattern VARCHAR(8000))
RETURNS TABLE
AS
RETURN
SELECT CleanedString =
(
SELECT SUBSTRING(@pString, t.N, 1)
FROM dbo.Tally t
WHERE t.n <= LEN(@pString)
AND SUBSTRING(@pString, t.N, 1) LIKE @pKeepPattern COLLATE Latin1_General_BIN
ORDER BY t.n
FOR XML PATH('')
)
;
After that, everything gets easy...
SELECT yt.OriginalString,c.CleanedString
FROM dbo.YourTable yt
CROSS APPLY dbo.CleanString(yt.OriginalString,'[^A-Za-z]')
;
That code will keep everything except the letters of the alphabet.
If you want to keep only the numeric digets of 0 through 9, the call would be similar using a different LIKE filter.
SELECT yt.OriginalString,c.CleanedString
FROM dbo.YourTable yt
CROSS APPLY dbo.CleanString(yt.OriginalString,'[0-9]')
;
If you prefer to not use a Tally Table (it's really good for rapid and simple development), you can build the high speed no-reads equivalent of a Tally Table using a rendition of Itzek Ben-Gans cCTEs (Cascading CTEs) in the function like this. It's a bit more complicated but it produces virtually zero reads and is about as fast as using a permanent Tally Table.
CREATE FUNCTION dbo.CleanString
(@pString VARCHAR(8000),@pKeepPattern VARCHAR(8000))
RETURNS TABLE
AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
--===== Clean the string according to the "keep" pattern.
SELECT CleanedString =
(
SELECT SUBSTRING(@pString, t.N, 1)
FROM cteTally t
WHERE t.n <= LEN(@pString)
AND SUBSTRING(@pString, t.N, 1) LIKE @pKeepPattern COLLATE Latin1_General_BIN
ORDER BY t.n
FOR XML PATH('')
)
;
Whether you use the Tally Table version or the cteTally version, both are in a high performance iTVF. As you've seen, it requires a slightly different usage than a slower scalar function but it's worth it.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply