November 15, 2007 at 7:54 pm
I have a db of web usage logs. Each hit, i.e. page, image, etc is a record.
Say I was visiting 3 different Microsoft sites, the "Site Visited" field, which is called DestHost, would be logged like this:
support.microsoft.com
licensing.microsoft.com
Now say that each time I was at each site I had 3 hits at each site.
The query I'm running now, which works just fine is:
SELECT DestHost, COUNT(DestHost) as 'Hits'
FROM WebProxyLog
GROUP BY DestHost
ORDER BY 'Hits' DESC
Right now, it returns this:
support.microsoft.com 3
licensing.microsoft.com 3
Now for my issue. I'm looking for a way for me to formulate a query that would strip off the subdomain (support, licensing, www) and group the domain and total the hits so that my query would return this:
microsoft.com 9
There also may be multiple subdomains, for instance: technet.support.microsoft.com
I would still want that to be grouped with microsoft.com
Is it possible to say, "Starting at the right of the DestHost field, go left until the 2nd dot and drop that dot and everything to the left of it. Now group by DestHost and count."
Hope this makes sense.
Thank you for your help.
Jeff
November 15, 2007 at 9:35 pm
Is it possible to say, "Starting at the right of the DestHost field, go left until the 2nd dot and drop that dot and everything to the left of it. Now group by DestHost and count."
Heh... I would hope so... here's the test...
CREATE TABLE dbo.Hits (RowNum INT IDENTITY(1,1) PRIMARY KEY, Site VARCHAR(100))
INSERT INTO dbo.Hits (Site)
SELECT 'support.microsoft.com' UNION ALL
SELECT 'licensing.microsoft.com' UNION ALL
SELECT 'www.microsoft.com' UNION ALL
SELECT 'support.microsoft.com' UNION ALL
SELECT 'licensing.microsoft.com' UNION ALL
SELECT 'www.microsoft.com' UNION ALL
SELECT 'support.microsoft.com' UNION ALL
SELECT 'licensing.microsoft.com' UNION ALL
SELECT 'www.microsoft.com'
SELECT Site = PARSENAME(Site,2)+'.'+ParseName(Site,1),
Hits = COUNT(*)
FROM dbo.Hits
GROUP BY PARSENAME(Site,2)+'.'+ParseName(Site,1)
--Jeff Moden
Change is inevitable... Change for the better is not.
November 18, 2007 at 9:09 pm
you can cheat and create a hosts table
create table hosts( hostname varchar(50))
select hostname, count(*) as Hits
from hosts h
inner join webproxylog w
on w.Desthost like '%'+h.hostname
group by hostname
order by Hits desc
if you are looking for a more automated approach, create a function that returns the last hostname and inserts the records that don't exists in the hosts table or just run
select desthost
from webproxylog w
inner join hosts h
on w.desthost like '%'+h.hostname
where h.hostname is null
to see what hosts are not being returned and add them yourself
Paul Ross
November 20, 2007 at 10:25 am
I believe this might work - not sure how effecient tho...I would recommend using the front end to do this rather than the database. But it was fun trying to get it to work with TSQL...I never knew about the REVERSE function before.
There needs to be at least two periods in the Site in order to work - that is why I prepended the '..' in the query.
Test thoroughly, I did not...
CREATE TABLE dbo.Hits (RowNum INT IDENTITY(1,1) PRIMARY KEY, Site VARCHAR(100))
INSERT INTO dbo.Hits (Site)
SELECT 'abc.support.microsoft.com' UNION ALL
SELECT 'microsoft.com' UNION ALL
SELECT 'com' UNION ALL
SELECT 'support.microsoft.com' UNION ALL
SELECT 'licensing.microsoft.com' UNION ALL
SELECT 'www.microsoft.com' UNION ALL
SELECT 'support.microsoft.com' UNION ALL
SELECT 'licensing.microsoft.com' UNION ALL
SELECT 'com' UNION ALL
SELECT 'microsoft.com' UNION ALL
SELECT 'com' UNION ALL
SELECT 'www.microsoft.com'
SELECT RIGHT('..'+Site,CHARINDEX('.',REVERSE('..'+Site),CHARINDEX('.',REVERSE('..'+Site))+1)-1)
, COUNT(Site) as 'Hits'
FROM Hits
GROUP BY RIGHT('..'+Site,CHARINDEX('.',REVERSE('..'+Site),CHARINDEX('.',REVERSE('..'+Site))+1)-1)
ORDER BY RIGHT('..'+Site,CHARINDEX('.',REVERSE('..'+Site),CHARINDEX('.',REVERSE('..'+Site))+1)-1)
November 20, 2007 at 9:27 pm
"REVERSE" is pretty expensive performance-wise... PARSENAME still does the trick even with the modified list...
CREATE TABLE dbo.Hits (RowNum INT IDENTITY(1,1) PRIMARY KEY, Site VARCHAR(100))
INSERT INTO dbo.Hits (Site)
SELECT 'abc.support.microsoft.com' UNION ALL
SELECT 'microsoft.com' UNION ALL
SELECT 'com' UNION ALL
SELECT 'support.microsoft.com' UNION ALL
SELECT 'licensing.microsoft.com' UNION ALL
SELECT 'www.microsoft.com' UNION ALL
SELECT 'support.microsoft.com' UNION ALL
SELECT 'licensing.microsoft.com' UNION ALL
SELECT 'com' UNION ALL
SELECT 'microsoft.com' UNION ALL
SELECT 'com' UNION ALL
SELECT 'www.microsoft.com'
SELECT Site = PARSENAME(Site,2)+'.'+ParseName(Site,1),
Hits = COUNT(*)
FROM dbo.Hits
GROUP BY PARSENAME(Site,2)+'.'+ParseName(Site,1)
--Jeff Moden
Change is inevitable... Change for the better is not.
November 21, 2007 at 3:46 pm
The reason I used reverse is if you have more subdomains...I believe PARSENAME is limited to 4 parts, because that is the maximum you ever need for object names.
select PARSENAME('abc.def.ghi.jkl.mno', 1)...returns NULL.
you are absolutely right about effeciency if the input is limited to maximum 4 parts.
jg
November 21, 2007 at 4:15 pm
Why not just a Tally table driven split function, then?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply