September 21, 2004 at 4:09 am
I have a nice problem. There is a char(255) field in a 3rd-party database that contains, among other things, an IP address that I would like to extract. But the IP address does not appear in the field at any fixed location; nor is it always preceded or followed by anydistinguishing characters.
How can I extract the IP address easily?
Regular expressions is one way, but all of the implementations I've found use functions, and I am on a SQL 7 server, so I cannot set up any user-defined functions.
Any assistance would be wolcome.
-steve
September 21, 2004 at 4:54 am
Easily? Without any distinguishing charachteristics? Can't be done "easily".
Anyway, you'll have to loop through the entire string until you find the ip (or at least by best guess believe you have found it)
The thought is this - find the first period in the string, see if the preceding 3 charachters are numbers, if they are, we hope this is the first octet in the ip, so we grab it from there. If it's not a number, we go look for the next period in the string. Repeat until there are no more periods.
Since there can't be a function, you may implement it as a stored procedure instead..
-- example
declare @s-2 varchar(255), @i int, @j-2 int
set @s-2 = 'long.. . . string from 123.123.123.123 that is placed here by chance. It could very well be somewhere else as well.'
set @i = 1
set @j-2 = len(@s)
while @i <= @j-2
begin
if (select isnumeric(substring(@s, (charindex('.', @s-2, @i) - 3), 3))) = 1 -- triplet is a number
begin
select substring(@s, (charindex('.', @s-2, @i) - 3), 15) -- grab the ip string
set @i = @j-2
end
else if isnumeric(substring(@s, (charindex('.', @s-2, @i) - 3), 3)) = 0 -- not a number
begin
set @i = @i + charindex('.', @s-2, @i) -- move offset so we can find the next period
end
end
Good luck
/Kenneth
September 21, 2004 at 7:38 am
Try this for a laugh
SELECT COALESCE(
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9][0-9][0-9]%',[column]),15),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9][0-9]%',[column]),14),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9]%',[column]),13),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9][0-9].[0-9][0-9][0-9]%',[column]),14),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]%',[column]),13),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9][0-9].[0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9].[0-9][0-9][0-9]%',[column]),13),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9].[0-9][0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9].[0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9].[0-9][0-9][0-9].[0-9][0-9][0-9]%',[column]),14),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9].[0-9][0-9][0-9].[0-9][0-9]%',[column]),13),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9].[0-9][0-9][0-9].[0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9].[0-9][0-9].[0-9][0-9][0-9]%',[column]),13),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9].[0-9][0-9].[0-9][0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9].[0-9][0-9].[0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9].[0-9].[0-9][0-9][0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9].[0-9].[0-9][0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9][0-9].[0-9].[0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9].[0-9][0-9][0-9].[0-9][0-9][0-9]%',[column]),13),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9].[0-9][0-9][0-9].[0-9][0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9].[0-9][0-9][0-9].[0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9].[0-9][0-9].[0-9][0-9][0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9].[0-9][0-9].[0-9][0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9].[0-9][0-9].[0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9].[0-9].[0-9][0-9][0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9].[0-9].[0-9][0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9][0-9][0-9].[0-9].[0-9].[0-9]%',[column]),9),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9][0-9][0-9]%',[column]),14),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9][0-9]%',[column]),13),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9][0-9].[0-9][0-9].[0-9][0-9][0-9]%',[column]),13),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9][0-9].[0-9][0-9].[0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9][0-9].[0-9].[0-9][0-9][0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9][0-9].[0-9].[0-9][0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9][0-9].[0-9].[0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9].[0-9][0-9][0-9].[0-9][0-9][0-9]%',[column]),13),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9].[0-9][0-9][0-9].[0-9][0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9].[0-9][0-9][0-9].[0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9].[0-9][0-9].[0-9][0-9][0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9].[0-9][0-9].[0-9][0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9].[0-9][0-9].[0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9].[0-9].[0-9][0-9][0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9].[0-9].[0-9][0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9][0-9].[0-9].[0-9]%',[column]),9),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9].[0-9][0-9][0-9].[0-9][0-9][0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9].[0-9][0-9][0-9].[0-9][0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9].[0-9][0-9][0-9].[0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9].[0-9][0-9].[0-9][0-9][0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9].[0-9][0-9].[0-9][0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9].[0-9][0-9].[0-9]%',[column]),9),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9].[0-9].[0-9][0-9][0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9].[0-9].[0-9][0-9]%',[column]),9),
SUBSTRING([column],PATINDEX('%[0-9][0-9].[0-9].[0-9].[0-9]%',[column]),8),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9][0-9][0-9]%',[column]),13),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9][0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9][0-9].[0-9][0-9][0-9].[0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9][0-9].[0-9][0-9].[0-9][0-9][0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9][0-9].[0-9][0-9].[0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9][0-9].[0-9].[0-9][0-9][0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9][0-9].[0-9].[0-9][0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9][0-9].[0-9].[0-9]%',[column]),9),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9].[0-9][0-9][0-9].[0-9][0-9][0-9]%',[column]),12),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9].[0-9][0-9][0-9].[0-9][0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9].[0-9][0-9][0-9].[0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9].[0-9][0-9].[0-9][0-9][0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9].[0-9][0-9].[0-9][0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9].[0-9][0-9].[0-9]%',[column]),9),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9].[0-9].[0-9][0-9][0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9].[0-9].[0-9][0-9]%',[column]),9),
SUBSTRING([column],PATINDEX('%[0-9].[0-9][0-9].[0-9].[0-9]%',[column]),8),
SUBSTRING([column],PATINDEX('%[0-9].[0-9].[0-9][0-9][0-9].[0-9][0-9][0-9]%',[column]),11),
SUBSTRING([column],PATINDEX('%[0-9].[0-9].[0-9][0-9][0-9].[0-9][0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9].[0-9].[0-9][0-9][0-9].[0-9]%',[column]),9),
SUBSTRING([column],PATINDEX('%[0-9].[0-9].[0-9][0-9].[0-9][0-9][0-9]%',[column]),10),
SUBSTRING([column],PATINDEX('%[0-9].[0-9].[0-9][0-9].[0-9][0-9]%',[column]),9),
SUBSTRING([column],PATINDEX('%[0-9].[0-9].[0-9][0-9].[0-9]%',[column]),8),
SUBSTRING([column],PATINDEX('%[0-9].[0-9].[0-9].[0-9][0-9][0-9]%',[column]),9),
SUBSTRING([column],PATINDEX('%[0-9].[0-9].[0-9].[0-9][0-9]%',[column]),8),
SUBSTRING([column],PATINDEX('%[0-9].[0-9].[0-9].[0-9]%',[column]),7),
'')
FROM
Far away is close at hand in the images of elsewhere.
Anon.
September 22, 2004 at 1:18 pm
I agree it is a messy search for an IP string. Consider a string like this:
DECLARE @string varchar(255)
SET @string = 'Title: Lawn Mower. Cost: $125.67. IP Address: 123.12.1.16'
There's a false positive in this string. I think the pure characteristics of an IP address
are: The IP string is all numbers and periods (period: aka point aka full-stop). There are
exactly 3 periods, the length of any numeric substring is between 1 and 3, and the total length of
the IP string is between 7 and 15. In fact, the value of each numeric substring must be
between 0 and 255.
-- Here's the plan for the stored procedure:
-- 1. Find iX = location of first '.'
-- 2. Set iY = iX-1; decrement while char at iY is NUMERIC, setting iBegin = iY
-- 3. Set iX2, iX3 = Location of second and third '.'
-- 4. Set iY = iX3+1; increment while char at IY is NUMERIC, setting iEnd = iY
-- 5. You have the begin and end. Calc IPLENGTH = end - begin + 1; get the candidate IP address
-- 6. Additional checks; add these if the above returns strings that are not the IP address:
-- - All characters from IX+1 to IX2-1 are numeric
-- - All characters from IX2+1 to IX3-1 are numeric
-- - Length of string is <= 15 (eg: 123.123.123.123)
-- - Length of string is >= 7 (eg: 1.2.3.4)
-- - Etc., etc. I'd code as few edits as necessary to get a reasonable answer.
-- 7. If the found string fails the edit, then resume at the next period.
DECLARE @work varchar(255), @string varchar(255)
DECLARE @ix int, @ix2 int, @ix3 int, @iy int, @ibegin int, @iend int
DECLARE @len int, @iplength int, @ipstring, @valid int
-- 1. Find iX = Location of first '.'
SET @work = @string
SET @len = LEN(@work)
SET @ix = CHARINDEX('.',@work,1)
SET @valid = 1
WHILE @valid = 1
BEGIN
-- 2. Set iY = iX-1; decrement while char at iY is NUMERIC, setting iBegin = iY
SET @iy = @ix -1
SET @ibegin = @ix
WHILE ISNUMERIC(SUBSTRING(@work,@iy,1))
BEGIN
SET @ibegin = @iy
IF @iy = 1 BREAK
SET @iy = @iy - 1
END
-- 3. Set iX2, iX3 = Location of second and third '.'
SET @ix2 = CHARINDEX('.',@work,@ix+1)
SET @ix3 = CHARINDEX('.',@work,@ix2+1)
-- 4. Set iY = iX3+1; increment while char at IY is NUMERIC, setting iEnd = iY
SET @iy = @ix3 + 1
SET @iend = @ix3
WHILE ISNUMERIC(SUBSTRING(@work,@iy,1))
BEGIN
SET @iend = @iy
IF @iy = @len BREAK
SET @iy = @iy + 1
END
-- 5. You have the begin and end. Calc IPLENGTH = end - begin + 1; get the candidate IP address
SET @iplength = @iend - @begin + 1
SET @ipstring = SUBSTRING(@work,@ibegin,@iplength)
SET @valid = 1
-- 6. Additional checks; add these if the above returns strings that are not the IP address:
-- - All characters from IX+1 to IX2-1 are numeric
-- - All characters from IX2+1 to IX3-1 are numeric
-- - Length of string is <= 15 (eg: 123.123.123.123)
-- - Length of string is >= 7 (eg: 1.2.3.4)
-- 7. IF THE STRING FAILS ANY EDIT(S) - SET @valid = 0
-- and keep looking starting at the next '.'
IF @valid = 0
BEGIN
SET @ix = @ix2
END
END
-- Return to the caller: @ipstring
Bob Monahon
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply