September 18, 2005 at 8:16 pm
Although the bugs are well-known, I thought it would be interesting to hear how people deal with the fact that ISNUMERIC regularly returns a value of 1 (True) when the input string is clearly not numeric. I am also interested in how people deal with the fact that using CONVERT (to non-money numeric formats) causes non-trappable errors when dealing with commas, or when handling text that is erroneously called a number by ISNUMERIC.
To deal with this problem, I recently had to write my own function to do conversions without the non-trappable errors. It wasn't pretty, but at least my custom function doesn't think the bizarre strings listed below are valid numbers. Anyone know if this is fixed in Yukon?
(all of the strings below return 1 from ISNUMERIC)
'-' --single non-numeric characters
'+'
','
'$'
',0' --misplaced commas, or missing digits causing clearly defective numbers
'1,'
'1,0'
'1.0,'
',1.0'
'1.0,012,'
'1.0,012,000'
'$1,00' --CONVERT bombs if converting to non-money numeric formats
'$1,00.'
'$1,00.0,'
'10E2' --legacy exp formats that cause many problems with conversion
'10D2'
'0000000.,01' --more bizarre forms that clearly indicate data-entry errors
'.,'
September 19, 2005 at 2:56 am
I prefer to use LIKE and wildcards, gives more control over what type of chars you want to qualify (assumes char/varchar datatypes for your data when validating)
create table #x ( field varchar(10) not null, type varchar(20) not null )
GO
insert #x select 'abc', 'alpha only'
insert #x select 'abc1', 'alpha and numeric'
insert #x select '123' , 'numeric only'
insert #x select 'a2c', 'alpha and numeric'
insert #x select 'abc.-!', 'alpha + special'
insert #x select '.-%!', 'special'
GO
select * from #x
where FIELD NOT LIKE '%[^0-9]%'
GO
drop table #x
GO
field type
---------- --------------------
123 numeric only
(1 row(s) affected)
/Kenneth
September 19, 2005 at 5:57 am
Yes, I also use pattterns like '%[^0-9]%', but I had to add a lot more code to handle single decimals and multiple commas spaced 3 digits apart, and I had to ensure they were only to the left of the decimal. I was surprised that I could not find a T-SQL function anywhere on the net to do this.
September 19, 2005 at 6:33 am
Well, in general the SISO(tm) algortithm is all too true in our line of business, so I try my hardest to apply the necessary amount of force on the delivering part of data. It's their job to deliver it in the required quality, not the reciever. (SQL in this case) It can be done, as you have proven, but it's not pretty, and certainly isn't what SQL was designed to do, so I'm not that surprised you couldn't find any function for it
/Kenneth
September 19, 2005 at 8:14 am
I am rather surprised by this response. I worked for a company that did account reconciliation and if I ever told my boss that the data came in bad and it was the Client's responsibility to get it right, I would have been out on my ear...
While I could not simply assume what was correct, it was my responsibility to find out what was right and get it in the proper format... Maybe you work in an area where you can make these demands of the Client, but I have had too many positions where that was definately not the case...
Just my $ 0.02.
I wasn't born stupid - I had to study.
September 20, 2005 at 2:43 am
My 'clients' in this case is other systems inhouse. If we make an arrangement of sending/receiving data to/from eachother, it's cruicial to lay out specs about what format the data is supposed to be in. (ie file fomats and such)
It's perfectly leagal to demand (or 'wish' if you like) that eg monetary amounts be sent without spaces or other junk in there. Of course, we all now that the world isn't perfect - we still recieve garbage in our datafiles and have to make do as best as possible, bit that doesn't mean that we shouldn't ask for clean data in the first place. Nine times out of ten we won't get it, but that's another story...
/Kenneth
September 20, 2005 at 9:16 am
You can rarely get away with saying "No can do" to an external client. You can try to tell them the data cleaning will make the schedule slip and cost more. Eventually they'll figure out it is cheaper to clean it themselves. For an internal client, depending on the company politics, I would tell them to clean their own data if I could get away with it.
As for technique, I would be tempted to use object automation calls to a regular expression object for this. The pattern "^\$?(\d+|\d{1,3}(,\d{3})*)(\.\d*)?$" does a pretty good job of recognizing numbers (with or without leading dollar sign, no embedded spaces, commas optional but must be used every 3 digits if present, optional decimal point, required digit before decimal, no exponentials). The LIKE patterns are just too limited.
September 20, 2005 at 9:46 am
Hi Scott,
I also was tempted to use RegExp, but I am afraid of the performance overhead from object instantiation in T-SQL. It's also not clear to me how RegExp performs compared to doing it with a more directed hard-coded technique (using LIKE or PATINDEX) that does not need to load such a big RegExp parsing engine.
So instead, I wrote a rather contorted function, using differerent PATINDEX constructs. I will try to get around to posting the code later in the scripts section. I'm not sure if this is any better, but I know that it works with all of my test cases: It will not choke on commas and dollar signs, and it will require that commas are properly positioned. And, at least at this point in my testing, it does not produce untrappable (i.e. guaranteed to immediately terminate your SP) CONVERT errors.
If I had more energy, I would compare it to a RegExp method.
September 20, 2005 at 10:52 am
OK, I submitted the script. If it is accepted, it should be available within 48 hrs.
Have a look, if your interested.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply