December 20, 2005 at 12:34 pm
I have a database whose sole purpose is to gather Internet usage logs. The problem is, I have a ton of rows for each site that gets visited. A URL for each gif and jpg, etc... 10's of rows that are not identical, but are redundant.
Snipet of data:
http://www.google.com/intl/en/images/logo.gif
http://www.google.com/intl/en_ALL/images/images_hp.gif
Basically, I'm trying to pare down the reports that are full of redundant information. I'm wondering if there is a way to remove anything after the "/" character. If I remove the http:// portion using LEFT and then remove everything after the "/" character. I could then select DISTINCT leaving me with a result set like:
I've been going through the String functions in BOL for the last couple hours now... Any ideas would be greatly appreciated.
December 20, 2005 at 12:48 pm
I guess the LEFT thing won't work to peal off the "http://" portion... rather a SUBSTRING would work better. But I still don't know how to remove everything after the "/" leaving a nice, clean, http://www.google.com
December 20, 2005 at 12:50 pm
declare @RowData varchar(100)
SET @RowData = 'http://www.google.com/intl/en_ALL/images/images_hp.gif'
PRINT SUBSTRING(REPLACE(@RowData,'http://',''),1,(CHARINDEX('/',REPLACE(@RowData,'http://',''),1) - 1))
December 20, 2005 at 12:53 pm
I had to do this exact thing in the past. This is what I used:
SELECT SUBSTRING(URL,1,CHARINDEX('/',URL,8)-1)
FROM SITES
It took me a couple of attempts and a number of different filters which were variations on this since there were some many different site names. This should get you started though.
December 20, 2005 at 1:01 pm
December 21, 2005 at 7:19 am
Not sure what the limits of your requirements are, but if it could possibly pertain to your situation, don't forget to take URLs starting with "https://" into account...
December 21, 2005 at 8:44 am
DECLARE @v-2 VARCHAR(200)
SET @v-2 = 'http://www.google.com/intl/en/images/logo.gif'
SELECT SUBSTRING(@v, CHARINDEX('//', @v-2) + 2, CHARINDEX('/', @v-2, CHARINDEX('//', @v-2) + 2) - (CHARINDEX('//', @v-2) + 2))
--That will work for *any* string prior to the '//'. If it's possible there won't be a '//' at all, use this:
SELECT CASE WHEN CHARINDEX('//', @v-2) = 0 THEN LEFT(@v, CHARINDEX('/', @v-2) - 1) ELSE <expression_above> END
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 21, 2005 at 10:24 am
You guys are the best! I got exactly what I needed to get working. Interestingly enough, we do NOT log https:// traffic. Hmm... Might as well code it in now as I'm sure that day will come soon enough.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply