August 7, 2006 at 3:28 pm
Does anyone now a quick way to remove any non - numeric data from a number (varchar field).
Example: 12345XXX = 12345 , 12## = 12, 1233%%%456 = 1233456
The code needs to be as efficient as possible as it will be performed on a query possibly hitting a million rows.
Thank you ...
August 7, 2006 at 4:12 pm
this kind of task needs to be handled by a regular expression, there is no other way in my opinion. In SQL 2005 u can write a CLR function that would match all non-digits in your string
1233%%%45##6
and replace them with nothing, resulting in the cleaned output
1234456
regex match pattern for this is very simple:
\D [meaning non-digit]
it would work blazingly fast even on huge tables.
u cannot do it in SQL 2000 obviously b/c regex is not supported there.
August 7, 2006 at 11:23 pm
Create scalar UDF taking varchar value as an argument and returning integer value.
Then update your table once using this function.
Don't be afraid to use loop inside of this UDF. Even with loop it will be fast enough, much faster than UPDATE itself. Of course, if you don't reference any table from this UDF.
And than you include this UDF into trigger validating inserted values for this table and converting them to proper format.
_____________
Code for TallyGenerator
August 8, 2006 at 9:58 am
Thank you for your replies. I need to store the data in the original format in my table, any non-numeric characters are masking characters to hide certain data. So ideally it would be handled by a UDF. I think I will have to stick with using the REPLACE function for the most common set of non-numeric characters that generally come up. (** The data is from outside sources that we have no control over **)
August 8, 2006 at 2:18 pm
Create new table and set up trigger to transfer inserted/data from old table to new one. Do select from new table.
If you store rubbish, you'll get rubbish back. No other way. Sorry.
_____________
Code for TallyGenerator
August 10, 2006 at 1:47 am
If you took the code below (or something like it) and made it into a udf it should be close to what you are after. However, it might not be "lightning fast"!
DECLARE @str_val VARCHAR(100)
DECLARE @ret_val VARCHAR(100)
DECLARE @pos SMALLINT
DECLARE @len SMALLINT
SET @str_val = '1233%%%456'
IF ISNUMERIC(@str_val) = 1
SET @ret_val = @str_val
ELSE
BEGIN
SET @ret_val = ''
SET @pos = 1
SET @len = LEN(@str_val)
WHILE (@pos <= @len)
BEGIN
IF SUBSTRING(@str_val, @pos, 1) LIKE '%[0-9]%'
SET @ret_val = @ret_val + SUBSTRING(@str_val, @pos, 1)
SET @pos = @pos + 1
END
END
PRINT '@ret_val => ' + @ret_val
--------------------------------
RETURNS: @ret_val => 1233456
August 10, 2006 at 7:20 am
i still think the best way for this kind of text processing (cleanup) is creating a Regular Expression CLR function and then invoking it on the whole field:
C# code for the function:
public striing CleanString()
{ //declare an input string of text
string InputString = @"1233%%%456XYZYZY";
//replace everything other than \d with an empty string string ResultString = Regex.Replace(ResultString, @"[^\d]", "");
return ResultString; }
the code looks much less confusing (no Substring of Pathindex nonsense); works very fast ; was able to process 100k recs in < 1 min
August 10, 2006 at 7:25 am
sorry for misprints:
C# code for the function:
public striing CleanString()
{ //declare an input string of text
string InputString = @"1233%%%456XYZYZY";
//replace everything other than \d with an empty string
string ResultString = Regex.Replace(InputString, @"[^\d]", "");
return ResultString;
//returns: 1233456
}
August 10, 2006 at 7:30 am
Thank you. Unfortunately we are still on 2000 - Hopefully by the end of the year I can start migrating our application to 2005.
August 10, 2006 at 8:43 am
Great, thats totally what I was looking for. Thank you!
August 10, 2006 at 8:48 am
Here's a UDF that may be more reliable, just in case...
--================================
-- Create the function
--================================
--DROP FUNCTION dbo.fNumbersOnly
GO
CREATE FUNCTION dbo.fNumbersOnly
(
@STR varchar(50)
)
RETURNS varchar(50)
AS
BEGIN
DECLARE @pos int
SET @pos = PatIndex('%[^0-9]%', @STR)
WHILE @pos > 0
BEGIN
SET @STR = Stuff(@str, @pos, 1, '') --delete invalid char
SET @pos = PatIndex('%[^0-9]%', @STR)
END
RETURN @STR
END
--================================
-- Test it
--================================
PRINT dbo.fNumbersOnly('1234%%%56')
PRINT dbo.fNumbersOnly('x1234%%%5y6z')
PRINT dbo.fNumbersOnly('123456')
PRINT dbo.fNumbersOnly('x123456x')
PRINT dbo.fNumbersOnly('1x2y3z4a5b6')
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply