January 30, 2007 at 9:49 am
Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/sjones/2843.asp
January 30, 2007 at 10:00 am
Nice article.
So what's the next tame those strings article in this serie?
February 12, 2007 at 11:23 pm
Being an old UNIX hack, I would have preferred to see the CHR(13) and CHR(10) in seperate REPLACEs. That would allow it to work with data from other sources than Windows.
Also, it would have been possible to use the CR themselves as your "markers", though this would rely on the format being consistent (allways a bad idea IMHO).
All in all - string manipulation is a pain. But what it comes down to is seeing the pattern, and writing your code to be as flexible as possible in dealing with it.
My application UI has a small function in JavaScript which goed through any text field that is being entered and removes a whole bunch of "illegal" character strings (Copy and paste is the devil's work and can introduce so many fun strings, e.g. Tab and other unprintable characters). About once every 6 months or so I discover a new character that we have to add to the forbidden list. I suspect Smilies will be next!
Good to see that someone is still paying attention to the basics!
February 13, 2007 at 12:11 am
Toby,
When you are ready to post the self learning "stripper" code, I am ready to take a look.
Pun intended.
Andy
February 13, 2007 at 3:09 am
I love these puzzles. If the objective is to get at those pesky numbers, how about this? The code is slightly more complex so as to cope with those records that have a blank string or no number! (this sort of routine can do all sorts of string manipulation). I added the test code so one can try out different approaches.
DECLARE @rawData TABLE (TheID INT IDENTITY(1,1),String VARCHAR(255)) INSERT INTO @rawData (string) SELECT 'We need to be sure that this purchase order is processed: 34030 Please check on the status' INSERT INTO @rawData (string) SELECT 'We need to be sure that this purchase order is processed: 34031 Please check on the status' INSERT INTO @rawData (string) SELECT 'We need to be sure that this purchase order is processed: 34032 Please check on the status' INSERT INTO @rawData (string) SELECT 'We need to be sure that this purchase order is processed: 35932 Please check on the status' INSERT INTO @rawData (string) SELECT 'We need to be sure that this purchase order is processed: 35939 Please check on the status' INSERT INTO @rawData (string) SELECT 'We need to be sure that this purchase order is processed: Please check on the status' /* to get the first number out of a string (0 if it cannot find one). Normally one would put it in a function for clarity. */ SELECT LEFT(start+' ',PATINDEX('%[^0-9]%',start+' ')) FROM (SELECT [Start]=SUBSTRING(string+'0', PATINDEX('%[0-9]%', string+'0'), LEN(string+'0')) FROM @RawData )f
Best wishes,
Phil Factor
February 13, 2007 at 9:16 am
Here's a function which will extract the 1st, 2nd, 3rd or whatever number string, optionally with a length greater than whatever you specify (say you only want strings which are 5 or longer and want to ignore street addresses):
SET
QUOTED_IDENTIFIER ON
GO
SET
ANSI_NULLS ON
GO
/****** Object: User Defined Function dbo.fnExtractNumberStrings Script Date: 03/05/2006 13:52:49 ******/
if
exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[fnExtractNumberStrings]') and xtype in (N'FN', N'IF', N'TF'))
drop
function [dbo].[fnExtractNumberStrings]
GO
CREATE
function fnExtractNumberStrings
/**********************************************************************
Purpose : Extracts the required numeric string from a given input string
Author : Caroline Barnard (Airborne Consulting)
Created : May 2006
Version : 1.00
Changed :
***********************************************************************/
(
@inputstring
varchar(255),
@strno
int,
@strsize
int
)
returns
varchar(30)
as
/* Used to fix non-normalised data. @inputstring is the string to be passed to the function
** @strno is the instance of numeric data to be returned, so the caller can specify the 1st, 2nd
** 3rd numeric string etc, and would generally call it repeatedly.
** @strsize is the minimum size of the string to be returned and defaults to 3. If, for example, the
** caller requests the 3rd string with this parameter set to 4, and passes the following data in:
** AB12345CD 12DE7891%$55678 ,
** the function will return 55678 as the 3rd numeric string because the 2nd string (12) has fewer than 4 digits and will
** be ignored
*/
begin
declare
@outnumber varchar(30),
@char1
char(1),
@dlen
int,
@dlenremains
int,
@ctr
int,
@substart
int,
@sublen
int,
@subend
int,
@wipstring
varchar(255)
if isnull(@inputstring,' ') = ' ' --input parameter not supplied with valid value
begin
set @outnumber = 'ERROR'
return @outnumber
end
select
@wipstring = ltrim(rtrim(@inputstring))
select
@dlen = len(@wipstring)
select
@ctr = 1
--
--search for numerical string
--
while
@ctr <= @strno
begin
select @substart = PATINDEX ( '%[0-9]%' , @wipstring ) --look for next numeric string
if @substart > 0 --numeric data found
begin
--
--Calculate length of string from 1st numerical position to end
--remove all leading non-numeric characters
--
select @dlenremains = @dlen + 1 - @substart
select @wipstring = right(@wipstring,@dlenremains )
select @dlen = @dlenremains
--
--Now find end of number string
--
select @subend = PATINDEX ( '%[^0-9]%' , @wipstring ) --next non-numeric character
if @subend = 0 --no non-numeric data left
select @subend = @dlen + 1 --fake the "endpoint"
if @subend >= @strsize + 1 --the minimum string size is met so we output the string
begin
select @outnumber = substring(@wipstring, 1, @subend - 1) --populate the output number
end
else
begin
select @outnumber = 'NOT FOUND' --no numeric string found on this iteration
select @ctr = @ctr - 1 --subtract one from loop counter to compensate for increment
end
--
--strip off the number just found
--
select @dlenremains = @dlenremains - @subend + 1
select @wipstring = right(@wipstring,@dlenremains )
select @dlen = @dlenremains
end
else
begin
select @outnumber = 'NOT FOUND' --no numeric string found
select @ctr = @strno + 1 --exit the loop
end
select @ctr = @ctr + 1
end
return
@outnumber
end
GO
SET
QUOTED_IDENTIFIER OFF
GO
SET
ANSI_NULLS ON
GO
February 13, 2007 at 9:48 am
Good!
This is exactly what I was going to suggest. To use the function. Maybe I would input a string into the function, and check for each character using IsNumeric() function.
Regards,Yelena Varsha
February 13, 2007 at 10:37 am
I'll admit that I don't have all of the ASCII codes memorized. However, we've reached a point where rote memorization of such things is silly. If you need it look it up:
Though, there are a few which get memorized because you use them too often, 13 and 9 decimal are good examples.
That said, good article, wrangling in strings can be an annoying task.
February 13, 2007 at 11:32 am
I agree, whenever I use CHR(13) and CHR(10) I always treat them as separate characters. Even in the windows environment you can never be sure that a return will be the combination of these two, it just depends on the application which created the file.
February 13, 2007 at 11:33 am
I have come up with a more general function with some more features such as case sensitive and optional replacement character.
CREATE FUNCTION dbo.fnFilterString
(
@Source VARCHAR(8000), @Filter VARCHAR(8000), @Replacement VARCHAR(8000) ) RETURNS VARCHAR(8000) AS BEGIN DECLARE@Index SMALLINT
SET@Index = DATALENGTH(@Source)
WHILE @Index > 0 IF SUBSTRING(@Source, @Index, 1) COLLATE Latin1_General_BIN LIKE @Filter SET@Index = @Index - 1 ELSE SELECT@Source = STUFF(@Source, @Index, 1, @Replacement), @Index = @Index - 1
RETURN @Source END
Use this test code!
declare @s varchar(500) select @s = 'Peso' + char(10) + 'Helsingborg' + CHAR(13) + 'Sweden' + CHAR(13) + CHAR(10) + 'SQL'
print @s print ''
print dbo.fnFilterString(@s, '[^' + CHAR(13) + CHAR(10) + ']', '_') print dbo.fnFilterString(@s, '[e]', '*') print dbo.fnFilterString(@s, '[^n]', '')
N 56°04'39.16"
E 12°55'05.25"
February 13, 2007 at 12:17 pm
What about simply using a Regex function? As discussed here on MSDN,
http://msdn.microsoft.com/msdnmag/issues/07/02/SQLRegex/default.aspx
you can create a CLR function that just passes along the parameters to the .NET regex methods. Save yourself the trouble of reimplementing complex string matching and replacement algorithms, when the functionality can be added so easily to any SQL Server 2005 instance.
Rick
townsends.ca
February 13, 2007 at 3:17 pm
I think the best would be to change the table design and normalize this thing a little to put order numbers in one table and standard text in another message table?
Regards,Yelena Varsha
February 16, 2007 at 3:50 pm
Steve, I'm confused on how you loaded the data into SQL Server in the first place using DTS. How did it know that the three lines were supposed to be part of the same record?
Aunt Kathi Data Platform MVP
Author of Expert T-SQL Window Functions
Simple-Talk Editor
February 4, 2008 at 1:02 pm
In looking at the problem, I thought to extend it to extracting all of the numbers in a string, not just the first. Then you could choose to deal with the first or the nth.
This function only deals with int data, thus calling it for a string that held "Invoice total is $123.45' will return 123 and 45. There are other number patterns such as IP that would also need an alternate pattern handling and it will only pull the first number in a SSN type pattern as it tries to allow negative numbers but doesn't know what 333-22-4444 means... ( 333, -22, -4444 or 333, 22, 4444?)
String parsing is fun, but a lack of domain makes for unpredictable results...
Cheers!
CREATE FUNCTION dbo.listInt( @value varchar(8000) )
RETURNS @out table (
ordinal int identity(1,1) primary key,
number int
)
AS
BEGIN
INSERT INTO @out ( number )
SELECT convert( int, substring(
@value,
n,
patindex('%[0-9][^0-9]%', substring( @value+' ', n, 8000 ) ) -- find this number's end from its start (and thus length)
))
FROM
(
SELECT (1 +n1.n +N10.n +n100.n +n1000.n) AS n
FROM
(select 0 as n union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) as n1
cross join (select 0 as n union select 10 union select 20 union select 30 union select 40 union select 50 union select 60 union select 70 union select 80 union select 90) as n10
cross join (select 0 as n union select 100 union select 200 union select 300 union select 400 union select 500 union select 600 union select 700 union select 800 union select 900) as n100
cross join (select 0 as n union select 1000 union select 2000 union select 3000 union select 4000 union select 5000 union select 6000 union select 7000 ) as n1000
WHERE
(1 +n1.n +N10.n +n100.n +n1000.n) BETWEEN 1 AND len(@value) -- only indexes for characters that exist
) AS numbers
WHERE
substring(' '+@value, n, 2) LIKE '[^-0-9][-0-9]' -- each number's start
ORDER BY
n ASC
RETURN
END
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply