December 2, 2010 at 5:30 pm
I have a column with these values
The Website is mysite1.com because it is my site.
The Website is hissite2.com because it is his site.
The Website is yoursite3.com because it is your site.
The Website is mysite4.com because it isn't my site.
The Website is hersite5.com because this is her site.
How do I extract only website name
mysite1.com
hissite2.com
yoursite3.com
mysite4.com
hersite5.com
Logically I should be able to capture text between 'The Website is ' and ' because'. I know its possible using substring functions but I couldn't figure out. Any help?
December 2, 2010 at 6:55 pm
Like this:
DECLARE @test-2 TABLE (RowID INT IDENTITY,
Col1 varchar(100));
INSERT INTO @test-2
SELECT 'The Website is mysite1.com because it is my site.' UNION ALL
SELECT 'The Website is hissite2.com because it is his site.' UNION ALL
SELECT 'The Website is yoursite3.com because it is your site.' UNION ALL
SELECT 'The Website is mysite4.com because it isn''t my site.' UNION ALL
SELECT 'The Website is hersite5.com because this is her site.';
SELECT RowID,
Substring(Col1, t2.Pos+3, t3.Pos-t2.Pos-4)
FROM @test-2 t
CROSS APPLY(select Pos = CharIndex('is', t.Col1)) t2
CROSS APPLY(select Pos = CharIndex('because', t.Col1)) t3;
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
December 3, 2010 at 1:34 pm
Its Perfect.. 🙂
December 3, 2010 at 10:53 pm
Black Mamba (12/3/2010)
Its Perfect.. 🙂
True enough. Wayne does a great job on these types of things. The real question here is, have you studied the code to know what's going on well enough to be able to apply it to other situations? That's the important part.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2010 at 3:47 pm
I think you can do that more easily, like so:
SELECT SUBSTRING(col1, (CHARINDEX('Website is', col1) + 11),
CHARINDEX(' because', col1, (CHARINDEX('Website is', col1) + 11)) -
(CHARINDEX('Website is', col1) + 11))
FROM @test-2
WHERE col1 LIKE '%Website is%because%'
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 6, 2010 at 3:55 pm
Thank you. Much easier now... 🙂
December 6, 2010 at 4:18 pm
ScottPletcher (12/6/2010)
I think you can do that more easily, like so:
SELECT SUBSTRING(col1, (CHARINDEX('Website is', col1) + 11),
CHARINDEX(' because', col1, (CHARINDEX('Website is', col1) + 11)) -
(CHARINDEX('Website is', col1) + 11))
FROM @test-2
WHERE col1 LIKE '%Website is%because%'
I don't know... you have 4 instances of CHARINDEX vs just 2. The question I would have is which performs better on a couple of thousand rows?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2010 at 4:21 pm
The second one works better for me..
December 6, 2010 at 5:10 pm
Black Mamba (12/6/2010)
The second one works better for me..
That's cool. What do you mean "better", though?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2010 at 5:16 pm
Unless you can guarantee the data will be in that format, I would personally be looking for a valid web site address.
This is not a complete solution as I have not limited the number of address parts to any standard...but it is a bit more robust I think...(prepares for incoming ....)
However, this does handle tricky data such as "The website is because.com"
It does not handle "The website is google.com." where the final period is not part of the web site address...
DECLARE @test-2 TABLE (RowID INT IDENTITY,
Col1 varchar(100));
INSERT INTO @test-2
SELECT 'The Website is mysite1.com because it is my site.' UNION ALL
SELECT 'The Website is hissite2.com because it is his site.' UNION ALL
SELECT 'The Website is his yoursite3.com because it is your site.' UNION ALL
SELECT 'The Website is mysite4.com' UNION ALL --= No "because"
SELECT 'The Website is hersite5.com because this is her site.' UNION ALL
SELECT 'hersite5.com because this is her site.' UNION ALL --= No "The Website is"
SELECT 'hersite5.com'; --= Just the web site.
;with Tally(N) as --= use a "tally" table to find the start of the website once we have the domain
( select number from master.dbo.spt_values where type ='P' )
SELECT RowID, SUBSTRING(Col1,t3.Pos+1 ,t2.Pos+4-t3.Pos)
FROM @test-2 t
--= The PATINDEX finds a valid website ending (not checked the RFC for all valid domain patterns) e.g. ".com"
CROSS APPLY(select Pos = PATINDEX('%[a-z0-9A-Z_].[a-z][a-z][a-z ] %',Col1+' ')) t2
--= The Tally table is used to find the last "space" before the string containing our potential web site address
CROSS APPLY(select MAX(N) from Tally where N<t2.Pos and SUBSTRING(Col1,N,1)=' ') t3(Pos)
If you have a true "Numbers" table - use that instead of spt_values.
This code also does not check that the web site contains only valid characters. In other words it is just a simple lookup that may not produce valid addresses.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
December 6, 2010 at 5:39 pm
Well, Thank you guys for all your response. In real-life its just not website but some IP addresses. Website addresses ending with .net / .org/ internam domains etc and some diferent text added before / after teh Ip/URL. So the queries that you gave me parses quite a decent amount of data.
December 7, 2010 at 12:44 pm
I don't know... you have 4 instances of CHARINDEX vs just 2
Quite true. A valid point.
3 of them are exactly the same function. Unfortunately, I don't think SQL will save/"re-use" the first function result for the other two invocations, like, say, some compilers do if you re-issue the exact-same function on exactly the same parameter(s) and no data changes could have occurred in between. With luck, maybe the SQL optimizer will be sophisticated enough one day to do that.:unsure:
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 7, 2010 at 3:35 pm
ScottPletcher (12/7/2010)
I don't know... you have 4 instances of CHARINDEX vs just 2
Quite true. A valid point.
3 of them are exactly the same function. Unfortunately, I don't think SQL will save/"re-use" the first function result for the other two invocations...
There is one way to make sure it does of course: CROSS APPLY the repeated CHARINDEX call.
SELECT SUBSTRING(col1, pos1, CHARINDEX(' because', col1, pos1) - pos1)
FROM @test-2
CROSS APPLY(SELECT CHARINDEX('Website is', col1) + 11) AS x(pos1)
WHERE col1 LIKE '%Website is%because%'
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
December 8, 2010 at 7:18 am
ScottPletcher (12/7/2010)
I don't know... you have 4 instances of CHARINDEX vs just 2
Quite true. A valid point.
3 of them are exactly the same function. Unfortunately, I don't think SQL will save/"re-use" the first function result for the other two invocations, like, say, some compilers do if you re-issue the exact-same function on exactly the same parameter(s) and no data changes could have occurred in between. With luck, maybe the SQL optimizer will be sophisticated enough one day to do that.:unsure:
Agh... I didn't check for the forumulas being identical. You are correct. SQL Server is pretty smart when it comes to that kind of reuse. But rather than guess, I'll do a performance test tonight to find out for sure in this case and post it here. Good learning opportunity for all of us. Thanks for the feedback, Scott.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply