TSQL question

  • I have a database whose sole purpose is to gather Internet usage logs.  The problem is, I have a ton of rows for each site that gets visited.  A URL for each gif and jpg, etc...  10's of rows that are not identical, but are redundant.

    Snipet of data:

    http://www.google.com/

    http://www.google.com/intl/en/images/logo.gif

    http://www.google.com/imghp?

    http://www.google.com/intl/en_ALL/images/images_hp.gif

    Basically, I'm trying to pare down the reports that are full of redundant information.  I'm wondering if there is a way to remove anything after the "/" character.  If I remove the http:// portion using LEFT and then remove everything after the "/" character.  I could then select DISTINCT leaving me with a result set like:

    http://www.google.com

    I've been going through the String functions in BOL for the last couple hours now...  Any ideas would be greatly appreciated.

  • I guess the LEFT thing won't work to peal off the "http://" portion...  rather a SUBSTRING would work better.  But I still don't know how to remove everything after the "/" leaving a nice, clean, http://www.google.com

  • declare @RowData varchar(100)

    SET @RowData = 'http://www.google.com/intl/en_ALL/images/images_hp.gif'

    PRINT SUBSTRING(REPLACE(@RowData,'http://',''),1,(CHARINDEX('/',REPLACE(@RowData,'http://',''),1) - 1))

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I had to do this exact thing in the past.  This is what I used:

    SELECT SUBSTRING(URL,1,CHARINDEX('/',URL,8)-1)

    FROM SITES

    It took me a couple of attempts and a number of different filters which were variations on this since there were some many different site names.  This should get you started though.

  •  

    Declare @a varchar(100)

    SET @a = 'http://www.google.com/intl/en_ALL/images/images_hp.gif'

    select SUBSTRING(@a,8, charindex('/',@a, 8)-LEN('http://')-1)

    WHERE charindex('/',@a, 8) > 0

    AND LEFT(@a,7) = 'http://'

    This will work if the url starts with 'http://' and there is a '/' anywhere after that.

     

  • Not sure what the limits of your requirements are, but if it could possibly pertain to your situation, don't forget to take URLs starting with "https://" into account...

  • DECLARE @v-2 VARCHAR(200)

    SET @v-2 = 'http://www.google.com/intl/en/images/logo.gif'

    SELECT SUBSTRING(@v, CHARINDEX('//', @v-2) + 2, CHARINDEX('/', @v-2, CHARINDEX('//', @v-2) + 2) - (CHARINDEX('//', @v-2) + 2))

    --That will work for *any* string prior to the '//'.  If it's possible there won't be a '//' at all, use this:

    SELECT CASE WHEN CHARINDEX('//', @v-2) = 0 THEN LEFT(@v, CHARINDEX('/', @v-2) - 1) ELSE <expression_above> END

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • You guys are the best!  I got exactly what I needed to get working.  Interestingly enough, we do NOT log https:// traffic.  Hmm...  Might as well code it in now as I'm sure that day will come soon enough.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply