July 6, 2012 at 5:15 am
Hello there,
I have this table having a varchar column which contains
numeric data + all kind of junk data (non-numeric chars) - not something in our environment control.
For some data processing I want to convert these values to numeric whenever possible -
so need to filter records which contain numeric values only.
As numeric value rules, the value should *only* have
- one or more digits [0-9]*
- zero or more comma before decimal point [,]+
- zero or one decimal point [.]?
- an optional(zero or one) minus/plus sign *in the beginning* [\+\-]?
I tried to use PATINDEX() as
WHERE PATINDEX('[\+\-]?[0-9,]*[.]?[0-9]*', varcharValue) > 0
But this doesn't seem to work 🙁
It seems PATINDEX() doesn't allow this kind of pattern matching and I need to go via CLR route.
Can anyone please confirm.
July 6, 2012 at 5:37 am
The following will determine which rows have valid numerics. If you need to convert the string to a number you may need another routine such as my BigInt converter.
declare @t table (strval varchar(10))
insert into @t values ('+123.45');
insert into @t values ('-12,345.67');
insert into @t values ('-1-2');
select isnumeric(strval) from @t;
/*
-- Tests pass isnumeric AND fail IsBigInt AND fail cast(vc as bigint)
-- range
SELECT IsNumeric('-9223372036854775809'), dbo.IsBigInt('-9223372036854775809')
SELECT IsNumeric('9223372036854775808'), dbo.IsBigInt('9223372036854775808')
-- invalid chars
SELECT IsNumeric('-5d2'), dbo.IsBigInt('-5d2')
SELECT IsNumeric('-5e2'), dbo.IsBigInt('-5e2')
SELECT IsNumeric('+3,4'), dbo.IsBigInt('+3,4')
SELECT IsNumeric('+3.4'), dbo.IsBigInt('+3.4')
-- pass this strange case
SELECT IsNumeric('00000000000000000000000000001'), dbo.IsBigInt('00000000000000000000000000001')
*/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.IsBigInt') AND type IN (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION dbo.IsBigInt
GO
CREATE FUNCTION dbo.IsBigInt (@a varchar(30))
returns bit
AS
BEGIN
-- Submitted to SqlServerCentral by William Talada
DECLARE
@s-2 varchar(30),
@i int,
@IsNeg bit,
@valid int
-- assume the best
SET @valid = 1
SET @IsNeg=0
SET @s-2 = ltrim(rtrim(@a))
-- strip OFF negative sign
IF len(@s) > 0
AND LEFT(@s, 1) = '-'
BEGIN
SET @IsNeg=1
SET @s-2 = RIGHT(@s, len(@s) - 1)
END
-- strip OFF positive sign
IF len(@s) > 0
AND LEFT(@s, 1) = '+'
BEGIN
SET @s-2 = RIGHT(@a, len(@a) - 1)
END
-- strip leading zeros
while len(@s) > 1 and left(@s,1) = '0'
set @s-2 = right(@s, len(@s) - 1)
-- 19 digits max
IF len(@s) > 19 SET @valid = 0
-- the rest must be numbers only
SET @i = len(@s)
WHILE @i >= 1
BEGIN
IF charindex(substring(@s, @i, 1), '0123456789') = 0 SET @valid = 0
SET @i = @i - 1
END
-- check range
IF @valid = 1
AND len(@s) = 19
BEGIN
IF @isNeg = 1 AND @s-2 > '9223372036854775808' SET @valid = 0
IF @IsNeg = 0 AND @s-2 > '9223372036854775807' SET @valid = 0
END
RETURN @valid
END
go
July 6, 2012 at 8:53 am
This was removed by the editor as SPAM
July 6, 2012 at 9:23 am
Yes, that's what we are currentlly using, but it is giving unexpected result sometimes -
for instance -
- IsNumeric(x) returns true - when varchar value has both comma and a tab character (Char(9)),
- IsNumeric(x) returns false - when varchar value has only tab character and no comma
July 9, 2012 at 1:50 am
This was removed by the editor as SPAM
July 9, 2012 at 5:18 am
SELECT *
FROM (SELECT ID = 1, somecolumn = '399,243.01' UNION ALL -- YES
SELECT ID = 2, '£399,243.01' UNION ALL -- NO
SELECT ID = 3, '39924301' UNION ALL -- YES
SELECT ID = 4, '399243.01' UNION ALL -- YES
SELECT ID = 5, '399,24301' UNION ALL -- YES
SELECT ID = 6, '399.243,01' UNION ALL -- YES
SELECT ID = 7, '399 243,01' UNION ALL -- NO
SELECT ID = 8, '-399.243,01' UNION ALL -- YES
SELECT ID = 9, '+399.243,01' UNION ALL -- YES
SELECT ID = 10, '_399.243,01' -- NO
) a
WHERE somecolumn NOT LIKE '%[^,.0-9+-]%'
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 9, 2012 at 2:56 pm
If you go via a CLR regex function, you can try this regex to see if it satisfies your spec:
(?xm)^[+-]?\d{1,3}(,\d{3})+(\.\d+)?$
|
^[+-]?\d{1,3}(\.\d+)?$
tested in RegexBuddy vs this input:
-4.012
6berry
+8
-234,345,234.012
7
pbo.PoBox34
got mathes:
-4.012
+8
-234,345,234.012
7
without seeing your real input strings it's hard to write a regex that would fit it all though.
July 9, 2012 at 4:21 pm
I think this gets you pretty close, w/o having to resort to CLR:
WHERE
--one +/- optional, but must be first if used
PATINDEX('%[+-]%', varcharValue) IN (0, 1) AND
varcharValue NOT LIKE '%[+-]%[+-]%' AND
--only one decimal allowed
varcharValue NOT LIKE '%.%.%' AND
--only chars shown allowed; '-' must be LAST, or it will be taken as a range indicator, not a char
varcharValue NOT LIKE '%[^0-9,.+-]%'
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply