Query Help

  • I have a db of web usage logs. Each hit, i.e. page, image, etc is a record.

    Say I was visiting 3 different Microsoft sites, the "Site Visited" field, which is called DestHost, would be logged like this:

    support.microsoft.com

    licensing.microsoft.com

    http://www.microsoft.com

    Now say that each time I was at each site I had 3 hits at each site.

    The query I'm running now, which works just fine is:

    SELECT DestHost, COUNT(DestHost) as 'Hits'

    FROM WebProxyLog

    GROUP BY DestHost

    ORDER BY 'Hits' DESC

    Right now, it returns this:

    support.microsoft.com 3

    licensing.microsoft.com 3

    http://www.microsoft.com 3

    Now for my issue. I'm looking for a way for me to formulate a query that would strip off the subdomain (support, licensing, www) and group the domain and total the hits so that my query would return this:

    microsoft.com 9

    There also may be multiple subdomains, for instance: technet.support.microsoft.com

    I would still want that to be grouped with microsoft.com

    Is it possible to say, "Starting at the right of the DestHost field, go left until the 2nd dot and drop that dot and everything to the left of it. Now group by DestHost and count."

    Hope this makes sense.

    Thank you for your help.

    Jeff

  • Is it possible to say, "Starting at the right of the DestHost field, go left until the 2nd dot and drop that dot and everything to the left of it. Now group by DestHost and count."

    Heh... I would hope so... here's the test...

    CREATE TABLE dbo.Hits (RowNum INT IDENTITY(1,1) PRIMARY KEY, Site VARCHAR(100))

    INSERT INTO dbo.Hits (Site)

    SELECT 'support.microsoft.com' UNION ALL

    SELECT 'licensing.microsoft.com' UNION ALL

    SELECT 'www.microsoft.com' UNION ALL

    SELECT 'support.microsoft.com' UNION ALL

    SELECT 'licensing.microsoft.com' UNION ALL

    SELECT 'www.microsoft.com' UNION ALL

    SELECT 'support.microsoft.com' UNION ALL

    SELECT 'licensing.microsoft.com' UNION ALL

    SELECT 'www.microsoft.com'

    SELECT Site = PARSENAME(Site,2)+'.'+ParseName(Site,1),

    Hits = COUNT(*)

    FROM dbo.Hits

    GROUP BY PARSENAME(Site,2)+'.'+ParseName(Site,1)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • you can cheat and create a hosts table

    create table hosts( hostname varchar(50))

    select hostname, count(*) as Hits

    from hosts h

    inner join webproxylog w

    on w.Desthost like '%'+h.hostname

    group by hostname

    order by Hits desc

    if you are looking for a more automated approach, create a function that returns the last hostname and inserts the records that don't exists in the hosts table or just run

    select desthost

    from webproxylog w

    inner join hosts h

    on w.desthost like '%'+h.hostname

    where h.hostname is null

    to see what hosts are not being returned and add them yourself

    Paul Ross

  • I believe this might work - not sure how effecient tho...I would recommend using the front end to do this rather than the database. But it was fun trying to get it to work with TSQL...I never knew about the REVERSE function before.

    There needs to be at least two periods in the Site in order to work - that is why I prepended the '..' in the query.

    Test thoroughly, I did not...

    CREATE TABLE dbo.Hits (RowNum INT IDENTITY(1,1) PRIMARY KEY, Site VARCHAR(100))

    INSERT INTO dbo.Hits (Site)

    SELECT 'abc.support.microsoft.com' UNION ALL

    SELECT 'microsoft.com' UNION ALL

    SELECT 'com' UNION ALL

    SELECT 'support.microsoft.com' UNION ALL

    SELECT 'licensing.microsoft.com' UNION ALL

    SELECT 'www.microsoft.com' UNION ALL

    SELECT 'support.microsoft.com' UNION ALL

    SELECT 'licensing.microsoft.com' UNION ALL

    SELECT 'com' UNION ALL

    SELECT 'microsoft.com' UNION ALL

    SELECT 'com' UNION ALL

    SELECT 'www.microsoft.com'

    SELECT RIGHT('..'+Site,CHARINDEX('.',REVERSE('..'+Site),CHARINDEX('.',REVERSE('..'+Site))+1)-1)

    , COUNT(Site) as 'Hits'

    FROM Hits

    GROUP BY RIGHT('..'+Site,CHARINDEX('.',REVERSE('..'+Site),CHARINDEX('.',REVERSE('..'+Site))+1)-1)

    ORDER BY RIGHT('..'+Site,CHARINDEX('.',REVERSE('..'+Site),CHARINDEX('.',REVERSE('..'+Site))+1)-1)

  • "REVERSE" is pretty expensive performance-wise... PARSENAME still does the trick even with the modified list...

    CREATE TABLE dbo.Hits (RowNum INT IDENTITY(1,1) PRIMARY KEY, Site VARCHAR(100))

    INSERT INTO dbo.Hits (Site)

    SELECT 'abc.support.microsoft.com' UNION ALL

    SELECT 'microsoft.com' UNION ALL

    SELECT 'com' UNION ALL

    SELECT 'support.microsoft.com' UNION ALL

    SELECT 'licensing.microsoft.com' UNION ALL

    SELECT 'www.microsoft.com' UNION ALL

    SELECT 'support.microsoft.com' UNION ALL

    SELECT 'licensing.microsoft.com' UNION ALL

    SELECT 'com' UNION ALL

    SELECT 'microsoft.com' UNION ALL

    SELECT 'com' UNION ALL

    SELECT 'www.microsoft.com'

    SELECT Site = PARSENAME(Site,2)+'.'+ParseName(Site,1),

    Hits = COUNT(*)

    FROM dbo.Hits

    GROUP BY PARSENAME(Site,2)+'.'+ParseName(Site,1)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The reason I used reverse is if you have more subdomains...I believe PARSENAME is limited to 4 parts, because that is the maximum you ever need for object names.

    select PARSENAME('abc.def.ghi.jkl.mno', 1)...returns NULL.

    you are absolutely right about effeciency if the input is limited to maximum 4 parts.

    jg

  • Why not just a Tally table driven split function, then?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply