March 23, 2009 at 2:07 am
Hi,
I got a table with datas that has alphanumeric values like 1230-544,15C5487,132DE78.
Now i need to extract only integers fomr these datas and convert to bigint. The other characters have no impact in my query. Is there way to produce a data like that?
Thanks in advance.
Mouli
"I Love Walking In The Rain So No One Can See Me Crying ! " ~ Charlie Chaplin
March 23, 2009 at 2:27 am
Your post is not very clear. Do you want to get only the records that have only numbers or do you want to get all the records but remove the non numeric characters from your column? Also do you have a list of possible characters, or could that column have all possible characters?
Adi
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
March 23, 2009 at 2:34 am
Have a look at this
declare @STR varchar(100)
select @STR = '1230-544,15C5487,132DE78'
;With Breakdown as
(
Select
SubString(@Str,1,1)[Chr],
1[Idx]
Union All
Select
SubString(@Str,Idx+1,1),
Idx+1
from Breakdown
where (Idx+1)<=Len(@Str)
)
select chr from Breakdown where isnumeric(chr) = 1
Edit:- This will return ',' and '-'. You need to filter them out.
"Keep Trying"
March 23, 2009 at 5:24 pm
Chirag (3/23/2009)
Have a look at thisdeclare @STR varchar(100)
select @STR = '1230-544,15C5487,132DE78'
;With Breakdown as
(
Select
SubString(@Str,1,1)[Chr],
1[Idx]
Union All
Select
SubString(@Str,Idx+1,1),
Idx+1
from Breakdown
where (Idx+1)<=Len(@Str)
)
select chr from Breakdown where isnumeric(chr) = 1
Edit:- This will return ',' and '-'. You need to filter them out.
Hey there, ol' friend... you do realize that recursive CTE's are slower than even cursors, right?
--Jeff Moden
Change is inevitable... Change for the better is not.
March 23, 2009 at 8:50 pm
Sorry... went to edit and ended up double posting, instead... please see below in my next post.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 23, 2009 at 8:52 pm
Mouli,
I believe this will solve your problem... no UDF overhead... no need for CROSS-APPLY. Just use the whole thing as another derived table. I suppose it could be a UDF or view, as well.
--===== Create and populate a test table with the data given in the post.
-- This is NOT a part of the solution
CREATE TABLE #YourTable
(
SomeString VARCHAR(20)
)
INSERT INTO #YourTable
SELECT '1230-544' UNION ALL
SELECT '15C5487' UNION ALL
SELECT '132DE78'
--===== This solves the problem.
;WITH
cteSplit AS
(--==== This not only splits out the individual characters, it only splits
-- out the digits from 0 to 9
SELECT SomeString,
ROW_NUMBER() OVER (ORDER BY yt.SomeString) AS CharacterNumber,
SUBSTRING(yt.SomeString,t.N,1) AS Character
FROM #YourTable yt
CROSS JOIN dbo.Tally t
WHERE t.N <= LEN(yt.SomeString)
AND SUBSTRING(yt.SomeString,t.N,1) LIKE '[0-9]'
)--==== This put's it all back together using a very high speed XML method
SELECT t1.SomeString,
CAST((SELECT '' + t2.Character
FROM cteSplit t2
WHERE t1.SomeString = t2.SomeString
ORDER BY t2.CharacterNumber
FOR XML PATH(''))
AS BIGINT) AS NumbersOnly
FROM cteSplit t1
GROUP BY t1.SomeString
If you don't already have a Tally table at hand, now's the time to build this incredibly useful tool. Read the article at the following link not only for how to build one, but to understand how it works, as well.
http://www.sqlservercentral.com/articles/TSQL/62867/
If, for some reason, folks won't let you build one in the database, please post back... there's more than one way to get around that.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 23, 2009 at 11:11 pm
Jeff,
I have used your script both in UDF and as a query. But the problem is when the table records exceeds 10000, there is a performance issue. It takes more than two mins.
This is my table:
DECLARE @intCount INT
SET @intCount = 10000
WHILE(@intCount>=1)
BEGIN
INSERT INTO #Temp (Data) SELECT '~`!@#$%^&*()_2+-={}[]:";<>,.?/|\",.~`'
SET @intCount = @intCount - 1
END
Also, i have tried out a function that i have created that has loop
CREATE FUNCTION dbo.ufn_ExtractNumbersFromText
(
@vchInput VARCHAR(200)
)
RETURNS VARCHAR(200)
AS
BEGIN
DECLARE @vchOutput VARCHAR(200)
SET @vchInput = LTRIM(RTRIM(@vchInput))
DECLARE @i INT
DECLARE @intCount INT
DECLARE @vchTemp VARCHAR(1)
SET @i = 1
SET @vchOutput = ''
SET @intCount = LEN(@vchInput)
WHILE(@intCount >= 1)
BEGIN
SET @vchTemp = SUBSTRING(@vchInput,@i,1)
SET @vchOutput= @vchOutput + CASE WHEN @vchTemp LIKE '[0-9]'
THEN @vchTemp ELSE '' END
SET @i = @i + 1
SET @intCount = @intCount - 1
END
RETURN @vchOutput
END
And this takes nearly 6 Secs For 20000.
Anyways the script you have sent can be modified or are there any other hidden advantage compared to my function.
"I Love Walking In The Rain So No One Can See Me Crying ! " ~ Charlie Chaplin
March 23, 2009 at 11:13 pm
Oops,
Forget to add the script of how the function is used on the table to get the required data.
SELECT * FROM #Temp
UPDATE #Temp SET Output = dbo.ufn_ExtractNumbersFromText(3,Data)
SELECT * FROM #Temp
Thanks
"I Love Walking In The Rain So No One Can See Me Crying ! " ~ Charlie Chaplin
March 23, 2009 at 11:56 pm
Jeff
I mistook this (1230-544,15C5487,132DE78) for a single value rather than 3 values that they are. I assume the op wanted to process one value.
Using a number table would be the best way of doing this.
Assuming you have a number table called number this is the query
select substring('1230-544',number,1)
from number where number <= len('1230-544')
and substring('1230-544',number,1) like '[%0-9%]'
"Keep Trying"
March 24, 2009 at 4:26 pm
Chirag (3/23/2009)
JeffI mistook this (1230-544,15C5487,132DE78) for a single value rather than 3 values that they are. I assume the op wanted to process one value.
Using a number table would be the best way of doing this.
Assuming you have a number table called number this is the query
select substring('1230-544',number,1)
from number where number <= len('1230-544')
and substring('1230-544',number,1) like '[%0-9%]'
Understood and thanks for the feeback, Chirag... just wanted you to know that recursion should be avoided even more than a well written cursor is.
Also, your query doesn't put it all back together like the op wanted and you don't need the % signs in the like because you're looking at a single character.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 24, 2009 at 4:29 pm
jchandramouli (3/23/2009)
Jeff,I have used your script both in UDF and as a query. But the problem is when the table records exceeds 10000, there is a performance issue. It takes more than two mins.
I check it when I get home from work tonight. Thanks for the feedback.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 24, 2009 at 4:29 pm
And, yes... there's a few more tricks we can try.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 24, 2009 at 4:30 pm
One more thing.... can you post the code for the UDF that uses my method... and why are you using a UDF for this? It was meant to resolve the entire table at once. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
March 24, 2009 at 10:18 pm
Wow.... you're right. I've never seen the XML concatenation method run so slow. Now that we have the test data you want, I believe we can do a little better than the While Loop solution you have.
Here's your test data generator (I did a little "Mo-dee-can tweekin' on it) and your test table (best I can make out from your code)...
--===== Create the test table and populate it
CREATE TABLE #Temp (Data VARCHAR(100), OutPut BIGINT)
INSERT INTO #Temp (Data)
SELECT TOP 20000 '~`!@#$%^&*()_2+-={}[]:";<>,.?/|\",.~`'
FROM Master.sys.SysColumns sc1
CROSS JOIN Master.sys.SysColumns sc2
Here's my idea of what the function should look like....
CREATE FUNCTION dbo.DigitsOnly
(@SomeString VARCHAR(8000))
RETURNS BIGINT
AS
BEGIN
DECLARE @CleanString VARCHAR(8000)
SET @CleanString = ''
SELECT @CleanString = @CleanString + SUBSTRING(@SomeString,t.N,1)
FROM dbo.Tally t
WHERE t.N <= LEN(@SomeString)
AND SUBSTRING(@SomeString,t.N,1) LIKE '[0-9]'
RETURN CAST(@CleanString AS BIGINT)
END
Notice that it uses a Tally table... the following article explains not only how to build one, but also how it replaces a While Loop... it's got a heck of a lot more uses than just what show up in that article...
http://www.sqlservercentral.com/articles/TSQL/62867/
... And, here's the code to use the function...
SELECT * FROM #Temp
UPDATE #Temp SET Output = dbo.DigitsOnly(Data)
SELECT * FROM #Temp
Including the two selects, the While loop version takes between 9 and 10 seconds on my box. The Tally table version takes between 4 and 5.
Here's another test table you can run... same everything except different data...
drop table #Temp
go
--===== Create the test table and populate it
CREATE TABLE #Temp (Data VARCHAR(100), OutPut BIGINT)
INSERT INTO #Temp (Data)
SELECT TOP 20000 '~`!@#$%^&*()_2+-={}[]:";<>,.?/|\",.~`'+RIGHT(NEWID(),12)
FROM Master.sys.SysColumns sc1
CROSS JOIN Master.sys.SysColumns sc2
Both runs take about the same time as the prevously did with the Tally table version being almost twice as fast.
Hope all that helps.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 25, 2009 at 12:13 am
Jeff Moden (3/24/2009)
Chirag (3/23/2009)
Jeffselect substring('1230-544',number,1)
from number where number <= len('1230-544')
and substring('1230-544',number,1) like '[%0-9%]'
Also, your query doesn't put it all back together like the op wanted and you don't need the % signs in the like because you're looking at a single character.
You are right.
"Keep Trying"
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply