September 17, 2012 at 11:17 am
Very nicely thought indeed. Great 😎 Not to mention how tricky it is to understand NOT LIKE %[NOT DIGITS]% 😉 Also the reasons why ISNUMERIC fails (do not work as desired ) are very informative.
Only one thing I wanted to say is that requests for handling decimal and positive/negative numbers seems quite rightful to me (May be because I had to deal with such scenario before :-D)
Talking of inline code OR iSF, may be something like the following could be a starting point
/*======= POPULATION OF TEMPORARY DATA =======*/
IF OBJECT_ID('tempdb..#test') IS NOT NULL
DROP TABLE #test
GO
CREATE TABLE #test ( string VARCHAR(10) )
INSERT INTO #test
SELECT '12334'
UNION ALL SELECT 'abc1234'
UNION ALL SELECT '123.4'
UNION ALL SELECT '-1233.'
UNION ALL SELECT '-123.33'
UNION ALL SELECT '-123.33.'
UNION ALL SELECT '-123.33-'
UNION ALL SELECT '.123'
UNION ALL SELECT ''
UNION ALL SELECT '123.33-'
UNION ALL SELECT '123.33+'
UNION ALL SELECT '-123.33+'
/*========================================*/
SELECT *
FROM #test
/*========= TRIM THE STRING AND REPLACE ANY UNWANTED CHARACTERS ===========================================*/
CROSS APPLY ( SELECTREPLACE(REPLACE(REPLACE(REPLACE(LTRIM(RTRIM([string])), --TRIM TRAILING SPACES
CHAR(9),''), --REPLACE ALL TABS WITH ZERO LENGTH STRING
CHAR(10), ''), --REPLACE ALL LINE FEEDS WITH ZERO LENGTH STRING
CHAR(13), ''), --REPLACE ALL CARRIAGE RETURNS WITH ZERO LENGTH STRING
CHAR(32),'') --REPLACE ALL SINGLE SPACES WITH ZERO LENGTH STRING
) TR ( TrimmedString )
CROSS APPLY ( SELECT LEFT(TrimmedString, 1), --FIRST CHARACTER OF THE STRING
RIGHT(TrimmedString, 1) --LAST CHARACTER OF THE STRING
) FLCHR ( FirstCharacter, LastCharacter )
/*========= +VE and -VE SIGNs ARE REMOVED ONLY IF THOSE ARE FIRST OR LAST CHARACTERS ========================*/
CROSS APPLY ( SELECT CASE WHEN FirstCharacter IN ('-','+') --CHECK IF FIRST LETTER OF STRING IS EQUAL TO NEGATIVE SIGN "-" OR POSITIVE SIGN "+"
AND LastCharacter NOT IN ('-','+') --AND LAST LETTER IS NOT EQUAL TO NEGATIVE SIGN "-" OR POSITIVE SIGN "+"
THEN RIGHT(TrimmedString, LEN(TrimmedString) - 1) --THEN REMOVE THE FIRST CHARACTER
WHEN LastCharacter IN ('-','+') --CHECK IF LAST LETTER OF STRING IS EQUAL TO NEGATIVE SIGN "-" OR POSITIVE SIGN "+"
AND FirstCharacter NOT IN ('-','+') --AND FIRST LETTER IS NOT EQUAL TO NEGATIVE SIGN "-" OR POSITIVE SIGN "+"
THEN LEFT(TrimmedString, LEN(TrimmedString) - 1) --THEN REMOVE THE LAST CHARACTER
ELSE TrimmedString -- OTHERWISE KEEP THE ACTUAL STRING
END
) SWAS ( StringWithoutAnySIGN )
/*========== DECIMAL IS REMOVED IF THERE IS ONLY ONE INSTANCE OF IT. OTHERWISE THE INPUT SHOULD BE CONSIDERED AS WRONG ===*/
CROSS APPLY ( SELECT CASE WHEN CHARINDEX('.', StringWithoutAnySIGN,
ISNULL(NULLIF(CHARINDEX('.', StringWithoutAnySIGN),0)+1,
LEN(StringWithoutAnySIGN)+ 1)) > 0 --IF THERE ARE MULTIPLE OCCURENCES OF DECIMALS i.e. "."
THEN StringWithoutAnySIGN -- KEEP THE NON NEGATIVE STRING
ELSE REPLACE(StringWithoutAnySIGN, '.', '') --OTHERWISE REPLACE THE DECIMAL FROM THE STRING
END
) y ( FinalString )
/*======= -- UNCOMMENT THE FOLLOWING BLOCK TO APPLY THE FILTER ==*/
--WHERE [y].[FinalString] NOT LIKE '%[^0-9]%'
--AND LEN([y].[FinalString]) -- ONLY STRINGS WITH MORE THAN ZERO CHARACTER ARE TO BE INCLUDED.
--> 0 -- ALSO LEN FUNCTION DO NOT COUNT TRAILING SPACES
/*===============================================================*/
It is all according to what I thought could be my requirements. Everyone can have his/her own requirements. Like having +/- at the end of the string may be invalid to someone. So your IsAllDigits function seems to be a more generic approach and can be easily modified such like I did.
Another thing I thought of that currency part can be handled the same way. Either it can be starting character OR can be a second character if the +/- are the first characters. Vice versa is the case for +/- in case of first and second characters while dealing with currency one character abbreviations. (Although in my case it was 3 characters abbreviation ISO standard and needed quite a bit of manual intervention :angry:)
September 17, 2012 at 11:37 am
SQL Kiwi (9/17/2012)
In any case, the assumption must be that TRY_* will only be used where necessary, and (relatively) slow as the intrinsic functions may be, it will likely out-perform any other T-SQL method, while having the advantage of being presumably very well-tested.
Very valuable point indeed. It would have been tested in a much better way. I hope there are no bugs left in those functions 🙂
But one thing I cannot understand is that if these functions are CLRs then why they cannot be part of earlier versions? Why only SQL 2012? I know MS could come up with various reasons and may could need manual intervention, but still could have been possible? Any way of finding the respective DLLs kind of things and register? Is it legitimate (I guess it would not be)? May be remembering the exercise of such data cleansing is making me more pessimistic :crazy:
September 17, 2012 at 12:25 pm
Brandy
I do not question Jeff merits and impressive contributions, nor I question yours. honestly, no sarcasm here. But I did feel his response as being as rude to me in the first place as my answer was to him. It is your opinion it was not, but you cannot know how I felt about that.
If I may have offended Jeff I do apologise to him, but again I think that is for Jeff to say.
On top of that, do you mean that because I am a rookie my point is not valid? What is my skill level is not for me to say but I do not think it is healthy to discuss the validity of a technical point based on someone's resume or community contributions background rather than proof.
Besides:
- I still think that what Jeff was asking me to code should rather be done using regex.
- I still think that the title of the article is rather unfortunate and I do not agree with Jeff explanation of why he choose it. However that was his call and this is my personal opinion.
September 17, 2012 at 1:07 pm
SQL Kiwi (9/17/2012)
[...] the issues in an interpreted language like T-SQL can be quite different from those in compiled languages like .NET. TRY_PARSE has more overhead than the others, mostly because it is more powerful and calls into the CLR, but a quick test on a million rows showed TRY_CAST on a 99%-failing conversion to run in 240ms compared with 200ms on an equivalent query without the call. Where all values pass the CAST, there was no measurable difference at all.In any case, the assumption must be that TRY_* will only be used where necessary, and (relatively) slow as the intrinsic functions may be, it will likely out-perform any other T-SQL method, while having the advantage of being presumably very well-tested.
Totally agree. I would definitely use it without real concern -- unlike implementing in CLR extensions myself.
And it is also much easier to read/maintain than any other method ;-). Honestly I would echo somebody else on this thread in saying I can't understand why MS didn't do it earlier.
>L<
September 17, 2012 at 1:25 pm
josuecalvo (9/17/2012)
BrandyI do not question Jeff merits and impressive contributions, nor I question yours. honestly, no sarcasm here. But I did feel his response as being as rude to me in the first place as my answer was to him. It is your opinion it was not, but you cannot know how I felt about that.
If I may have offended Jeff I do apologise to him, but again I think that is for Jeff to say.
On top of that, do you mean that because I am a rookie my point is not valid? What is my skill level is not for me to say but I do not think it is healthy to discuss the validity of a technical point based on someone's resume or community contributions background rather than proof.
Besides:
- I still think that what Jeff was asking me to code should rather be done using regex.
- I still think that the title of the article is rather unfortunate and I do not agree with Jeff explanation of why he choose it. However that was his call and this is my personal opinion.
Nope... no offense taken. You stated that this could be done with a conversion to the money datatype and I'm still looking forward to that code.
Additionally, from previous tests done, I still believe that RegEx will be slower and unnecessary in this case. If you'd be willing to do a test on your machine, I could easily provide you with a million row test table so we can find out.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 17, 2012 at 2:28 pm
I am glad we are cool. 🙂
The first code I submitted worked for me to clean up some data for my company. In my case the goal was not to identify what was a valid number following a given criteria but rather getting the numeric values out of the data where existing. That was a one off so I did not pay excessive attention to come up with the best possible code but rather to get the job done. I guess that that may be the case of many of us on our day jobs, so that is why I submitted my code in the first place 😉
My code to extract the human readable numeric values would be:
[font="Courier New"]select
isnumeric([Value]),
case when isnumeric([Value]) = 1 and [Value] not like '%[a-z]%' then convert(float, convert(money, [Value])) end
from
(
select [Value] = '$1,234.12'
union all
select '£1234'
union all
select '£1,23.4'
union all
select '1.234'
union all
select '12,34'
union all
select '12c34'
union all
select '12e34'
) x
[/font]
I have used RegEx several times to extract and clean up data for small to medium sized record sets. It has always be up to the job despite being used straight on SQL CLR. And the flexibility of RegEx is a plus no one can deny. However, in order to get the best of it as a core feature for an system I would not run it from SQL though, but rather from the .net framework using SSIS or C#.
Despite that I must confess I am a fanatic of TSQL and I find easier to do these things straight on the SQL engine. Please, send me the sample data so I can play with it a check how RegEx would perform.
September 17, 2012 at 8:33 pm
Usman Butt (9/17/2012)
SQL Kiwi (9/17/2012)
In any case, the assumption must be that TRY_* will only be used where necessary, and (relatively) slow as the intrinsic functions may be, it will likely out-perform any other T-SQL method, while having the advantage of being presumably very well-tested.But one thing I cannot understand is that if these functions are CLRs then why they cannot be part of earlier versions?
Only TRY_PARSE is hosted by the CLR, the other two are T-SQL intrinsic functions. It would be quite unusual to add T-SQL language features to old versions of the product through a service pack. Perhaps there is a commercial element too - all part of the reasons to upgrade!
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
September 17, 2012 at 8:36 pm
Lisa Slater Nicholls (9/17/2012)
Honestly I would echo somebody else on this thread in saying I can't understand why MS didn't do it earlier.
My guess is they just didn't think of it as important enough. The implemented function bears a very close resemblance to the Connect suggestion Hugo mentioned ages back:
http://connect.microsoft.com/SQLServer/feedback/details/354766/add-a-is-valid-convert-function
So perhaps this shows the importance of submitting feedback that way?
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
December 25, 2012 at 8:42 am
Good article.
December 25, 2012 at 10:36 am
Thank you for your feedback, Neha05.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 9, 2013 at 11:44 am
I've read the article and skimmed through all the comments looking if there was something that talked about how to actually do the cross apply as Jeff suggests, but I haven't seen an example of it.
I'm actually trying to implement this, but I'm just not getting it and was wondering if someone could help me out.
Here's the problem I'm trying to solve
I've got a list of product numbers that I need to validate are exclusively digits.
I created the function as specified ( renamed it to fn_isAllDigits to adhere to our naming standards and am trying to use it but I guess I just don't understand the how to properly format the query to use the cross apply.... Can someone help here's what I'm starting with:
Note: In this version of the query I'm trying to drill in and find all product numbers that are six digits long. The latest request was to include any rows that are longer than 6 digits but have leading zeroes that should also be included. eg. 00000001234546 should also be included) The simple solution seemed to be to just convert it to an int and then check the length as you see I attempted because the patindex check did not catch this: 0-3-101
SELECT DISTINCT PrimeProdNo
FROM ourDatabase.schema.tbl_product tp1
WHERE (6 = LEN(RTRIM(PrimeProdNo))
OR 6 = LEN(RTRIM(CAST(PrimeProdNo AS BIGINT)))) --I just added this to
AND PATINDEX('%[A-Z]%', PrimeProdNo) = 0 --this needs to be more than just a letter check
AND LEFT(PrimeProdNo, 1) > 0
Side Note 2. I got this code from someone else, but I was thinking about changing that from distinct to a group by on PrimeProdNo. Anyone have any comments on which is the better method to employ?
January 9, 2013 at 12:10 pm
Brett Phipps (1/9/2013)
I've read the article and skimmed through all the comments looking if there was something that talked about how to actually do the cross apply as Jeff suggests, but I haven't seen an example of it.I'm actually trying to implement this, but I'm just not getting it and was wondering if someone could help me out.
Here's the problem I'm trying to solve
I've got a list of product numbers that I need to validate are exclusively digits.
I created the function as specified ( renamed it to fn_isAllDigits to adhere to our naming standards and am trying to use it but I guess I just don't understand the how to properly format the query to use the cross apply.... Can someone help here's what I'm starting with:
Note: In this version of the query I'm trying to drill in and find all product numbers that are six digits long. The latest request was to include any rows that are longer than 6 digits but have leading zeroes that should also be included. eg. 00000001234546 should also be included) The simple solution seemed to be to just convert it to an int and then check the length as you see I attempted because the patindex check did not catch this: 0-3-101
SELECT DISTINCT PrimeProdNo
FROM ourDatabase.schema.tbl_product tp1
WHERE (6 = LEN(RTRIM(PrimeProdNo))
OR 6 = LEN(RTRIM(CAST(PrimeProdNo AS BIGINT)))) --I just added this to
AND PATINDEX('%[A-Z]%', PrimeProdNo) = 0 --this needs to be more than just a letter check
AND LEFT(PrimeProdNo, 1) > 0
Side Note 2. I got this code from someone else, but I was thinking about changing that from distinct to a group by on PrimeProdNo. Anyone have any comments on which is the better method to employ?
Brett, your best bet is to take this to the T-SQL forum. Paste in your function code (refer by link back to Jeff's article if you need to), paste in your query code, paste in table DDL and an INSERT statement with sample data. With all that information, we will be able to help you a lot better. Plus you'll get a lot more attention there with a new post than on a thread this old.
January 9, 2013 at 12:36 pm
Brett this is actually pretty simple. I included some sample data using your names and even included the condition of len = 6. for numbers though we don't want to use len, we want to use a range of values.
;with tbl_product(PrimeProdNo) as
(
select '383745' union all --valid
select '093847' union all --this is not length 6 so should not return
select '1234567' union all --this is length 7 so should not return
select 'asdf3e5' union all --non numerics
select '12(854' union all --non numerics
select '867596' union all --valid
select '0-3-101' --invalid
)
, tempResult as
(
select *
from tbl_product
cross apply dbo.IsAllDigits(PrimeProdNo)
)
select * from tempResult
where IsAllDigits = 1
and PrimeProdNo > 99999
and PrimeProdNo <= 999999
The reason we can safely do this type of range validation now is because we have stripped out any rows that are not numeric in the cte so the implicit conversion in the where clause will be successful.
You might want to take a look at Paul White's articles on using and understanding apply.
http://www.sqlservercentral.com/articles/APPLY/69953/[/url]
http://www.sqlservercentral.com/articles/APPLY/69954/[/url]
--EDIT--
In retrospect we don't need the second cte. The non-numeric value will already be removed. The following will also work.
select *
from tbl_product
cross apply dbo.IsAllDigits(PrimeProdNo)
where IsAllDigits = 1
and PrimeProdNo > 99999
and PrimeProdNo <= 999999
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
January 9, 2013 at 1:30 pm
I'm not sure if I should respond or do as Brandie suggests, but since I've gotten a response here. I'm going to go ahead and post the create for my table
CREATE TABLE [ret].[tbl_product](
[LocationNo] [decimal](7, 0) NOT NULL,
[PrimeProdNo] [char](16) NOT NULL,
[ProdCategory] [char](3) NOT NULL,
[ProdCategoryHomeOffice] [char](3) NOT NULL,
[MFA] [varchar](5) NULL,
[ProdDesc] [char](30) NOT NULL,
[ProdDescHomeOffice] [char](30) NOT NULL,
[QuantityOnHand] [decimal](7, 0) NULL,
[ProdUomSale] [char](5) NOT NULL,
[ProdUomPurchase] [char](5) NOT NULL,
[ProdUomA] [char](5) NOT NULL,
[Density] [decimal](6, 2) NULL,
[RateA] [decimal](6, 2) NULL,
[LastSaleDate] [date] NULL,
[LastReceiptDate] [date] NULL,
[AddedDate] [date] NULL
) ON [PRIMARY]
GO
This table was created from a cobol file and loaded with data from over a hundred different stores who have each entered data in a myriad of methods. We are trying to find all the distinct differences so we can create one new file that will be pushed back out to the stores.
I've attempted using Sean's method above but when I switch to using the real table I get "Conversion failed when converting the varchar value '00161B ' to data type int.
When I use the example below it works but it doesn't allow me to filter out the products that are less than 100000. If i enable the
select *
from ret.tbl_product
cross apply Utilities.dbo.fn_isAllDigits(RTRIM(PrimeProdNo))
where IsAllDigits = 1
--and PrimeProdNo BETWEEN 100000 and 999999
AND (6 = LEN(RTRIM(PrimeProdNo))
--OR 6 = LEN(RTRIM(CAST(PrimeProdNo AS BIGINT)))
)
If I try to enable the check for PrimeProdNo I get the following error.
Msg 245, Level 16, State 1, Line 25
Conversion failed when converting the varchar value '3EHVHD ' to data type int.
Msg 248, Level 16, State 1, Line 25
The conversion of the varchar value '5510708105 ' overflowed an int column.
so I did this to test if it's an issue with the table layout and proved that it is:
IF OBJECT_ID('tempdb..#tbl_product') IS NOT NULL
DROP TABLE #tbl_product
CREATE TABLE #tbl_product(
[PrimeProdNo] [char](16) NOT NULL
) ON [PRIMARY]
INSERT INTO #tbl_product
(PrimeProdNo)
select '383745' union all --valid
select '093847' union all --this is not length 6 so should not return
select '1234567' union all --this is length 7 so should not return
select 'asdf3e5' union all --non numerics
select '12(854' union all --non numerics
select '867596' union all --valid
select '0-3-101' UNION ALL --invalid
SELECT '0-3-101 ' UNION ALL
SELECT '00161B '
SELECT * FROM #tbl_product AS tp
select *
from #tbl_product
cross apply Utilities.dbo.fn_isAllDigits(RTRIM(PrimeProdNo))
where IsAllDigits = 1
and PrimeProdNo BETWEEN 100000 and 999999
AND (6 = LEN(RTRIM(PrimeProdNo))
--OR 6 = LEN(RTRIM(CAST(PrimeProdNo AS BIGINT)))
)
Using this you will get the same conversion error I'm running into. I suspect this is because the field is defined as a char instead of varchar.
January 9, 2013 at 3:42 pm
Hi Brett,
The problem is that you are combining a check for all numbers and a cast to integer in a single query. The optimizer is free to carry out those opoerations in any order - and in this case, it chooses to do the converstion first, then do the user-defined function.
One way to work around this would be to use CASE, where a THEN clause should only be evaluated if the corresponding WHEN clause is true. You would get something like
select *
from ret.tbl_product
CROSS APPLY Utilities.dbo.fn_isAllDigits(RTRIM(PrimeProdNo))
where IsAllDigits = 1
--and PrimeProdNo BETWEEN 100000 and 999999
AND (6 = LEN(RTRIM(PrimeProdNo))
OR 6 = CASE WHEN IsAllDigits = 1 THEN LEN(RTRIM(CAST(PrimeProdNo AS BIGINT))) ELSE 0 END
)
Viewing 15 posts - 136 through 150 (of 168 total)
You must be logged in to reply to this topic. Login to reply