Shouldn't this process faster?

  • I have an ISA Server 2004 logging to an SQL database. The logging works as it should; however, queries are processed, what seems to me, really slowly. It should be said that I'm an SQL noob. Here's the setup:

    Windows Server 2003 SP2 running SQL 2005 Workgroup SP2. Server is a dual-core Xeon 3Ghz with 2GB RAM.

    The database consists of 1 flat table that I'm trying to query. Here is the setup on the table:

    CREATE TABLE [dbo].[WebProxyLog](

    [ClientIP] [varchar](32) NULL,

    [ClientUserName] [nvarchar](514) NULL,

    [ClientAgent] [varchar](128) NULL,

    [ClientAuthenticate] [varchar](5) NULL,

    [logDate] [datetime] NULL,

    [logTime] [datetime] NULL,

    [service] [varchar](16) NULL,

    [servername] [nvarchar](32) NULL,

    [referredserver] [varchar](32) NULL,

    [DestHost] [varchar](255) NULL,

    [DestHostIP] [varchar](32) NULL,

    [DestHostPort] [int] NULL,

    [processingtime] [int] NULL,

    [bytesrecvd] [int] NULL,

    [bytessent] [int] NULL,

    [protocol] [varchar](12) NULL,

    [transport] [varchar](8) NULL,

    [operation] [varchar](24) NULL,

    [uri] [varchar](2048) NULL,

    [mimetype] [varchar](32) NULL,

    [objectsource] [varchar](12) NULL,

    [resultcode] [int] NULL,

    [CacheInfo] [int] NULL,

    [rule] [nvarchar](128) NULL,

    [FilterInfo] [nvarchar](128) NULL,

    [SrcNetwork] [nvarchar](128) NULL,

    [DstNetwork] [nvarchar](128) NULL,

    [ErrorInfo] [int] NULL,

    [Action] [varchar](32) NULL[/size]

    I have a clustered index using the DestHost column.

    The database has about 700K records.

    Here is one of the queries I'm trying to run:

    SELECT TOP 50 DestHost, COUNT(DestHost)

    AS 'Number of requests'

    FROM WebProxyLog

    WHERE LEN(DestHost) > 1 and ClientUserName <> 'anonymous'

    GROUP BY DestHost ORDER BY 'Number of requests' DESC

    I also have a stored procedure set up using this query. It doesn't matter whether I execute the SP or simply run the query as above, it takes a full 2 minutes to finish.

    It would seem to me that this is entirely too long.

    I have also noticed that when I run this query, the hard drive chugs away until it's finished. During the query, the processor is running at about 5% or less.

    Is this a normal time for this query to run? If not, any ideas where to look to try and resolve the problem? Is it something to do with it being SQL 2005 Workgroup?

    Sorry for the long post. Just wanted to be as thorough as possible.

    Any help is greatly appreciated.

    Jeff

  • Its not a good idea to have an index on a varchar(255) column specially when the table contains 700k records, instead of it you can add an identity column and make it as a primary key.

    Looking at the query, i think you need to add a non-clustered index on columns DestHost and ClientUserName...

    CREATE INDEX [IX_WebProxyLog_DestHost_ClientUserName] ON dbo.WebProxyLog( DestHost, ClientUserName )

    --Ramesh


  • Ok, there's a few issues at play here.

    Your filter (LEN(desthost)>1) is forcing a table scan, as any function on a column prevents usage of indexes for that condition. (See this article[/url] for more details. What are you trying to filter out with that condition?

    You have no other indexes on the table? (not that the other condition is likely to be seekable, depending on the amount of rows that are anonymous)

    Putting the clustered index on a wide column is usually bad idea. I would recommend moving it to the logdate colum, but see the next point

    Why do you have the log date and the log time stored in separate columns (both datetime columns) They should be in one column.

    As a first suggestion, I would say move the cluster to the logdate column, and add a nonclustered index on clientusername, desthost. Run the query again and check the execution plan. If it doesn't appear the index is used, switch the order of the columns to desthost, clientusername.

    As a longer term suggestion, that table could use some normalisation. The client info and the dest info could both be broken out into lookup tables. Reducing the width of the row will also help query speed, and will save some space.

    If those suggestions don't help, attach the execution plan (saved as a .sqlplan file and zipped) to a post in this thread and I can take another look.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • As far as the columns I have, they were created by ISA Server and that's the way ISA needs to have the table constructed to be able to INSERT to the table properly. I agree it would make more sense to have logDate and logTime in one column and to have ClientUserName be a lookup, but I have no choice.

    Concerning the indexes, or lack thereof, the table originally had non-clustered indexes on logDate and logTime. After reading an article on performance, the writer suggested creating a clustered index on the field you use the most. So I chose the DestHost field. Pretty much arbitrary. Again, I'm a noob, but trying to learn.

    The LEN > 1 part of the query I borrowed from someone else. It made sense at the time as it would not be returning records that had that field empty, but I'm not really sure what good it's doing. I'll remove it.

    Concerning ClientUserName <> 'anonymous':

    Because it's logging ISA Server accesses and the way that Internet accesses happen, there are MANY entries with 'anonymous'. Before I posted here last night and to try and speed up performance, I actually deleted all of the entries that had ClientUserName = 'anonymous'. There were 390K. Getting rid of them did not help. I can take that part out of the query, I just figured if I created reports from this, I didn't really want to see the 'anonymous' entries.

    Thank you for your input. I will try creating the indexes you suggested, modify the query and let you know what happens.

    Jeff

  • Makes sense.

    Try moving the cluster to logdate (or if you do queries by date and time, to a logdate and logtime together). Is not ideal, but is better than a 255 varchar.

    The reason for that is that the clustering key is included in every single nonclustered index as the lookup to the table. So really wide clustered indexes force really wide non clustered indexes.

    The wider the keys are, the fewer entries fit into index page and the seeper the index tree becomes. That results in heavier IO usage when querying the table.

    Also, having the first field of the cluster on the logdate will reduce fragmentation which is probably quite bad at this point.

    From what you say, try changing the query to read

    where desthost <> '' and desthost is not null and ClientUserName <> 'anonymous'

    Still try the two different indexes I suggested and see which one helps. Gut feel is that it will be the one desthost, clientusername as that index will store the data in the correct order for the aggregation.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I changed the clustered index to logDate. I took the LEN(DestHost) part out of the query and re-ran it. It came back in 1:20. Better, but not good. Then, as you instructed, I added non-clustered with DestHost and ClientUserName. It came back in 1 second! WooHoo!

    Now, basically what the query does is give me the top 50 sites visited by my network for all time. What I'd also like to be able to do is throw some date ranges in, i.e. Top 50 this week, this month, today, etc.

    To get today, I used the same query, but just added:

    WHERE logDate= '10-25-2007'

    It added no time to the response time. Still about 1 second. GREAT!

    Now, I'm off-topic and if I need to repost to a different group, please tell me, but what is the best way for me to query for TODAY, or THIS WEEK, or THIS MONTH?

    The date entry copied from the datasheet of the table looks like this:

    10/20/2007 12:00:00 AM12/30/1899 11:08:45 PM

    Remember, they're two separate fields: logDate and logTime

    This is just the way that ISA Server inputs the data.

    Thank you so much for your help. You're fantastic. Again, if I need to start a new thread, let me know.

    Jeff

  • Now you know the power of a well-placed index. 😀 Glad I could help.

    As for your date. Easiest way is to do a range query on the logdate. We can ignore the log time, since you're looking for today and larger granularities. If you wanted requests this hour, would be a fair bit more difficult.

    -- this is an example for today. You can subtract days from the variable date to get previous days

    DECLARE @DateRangeStart DATETIME

    SET @DateRangeStart = DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0) -- trick for removing the time from a datetime

    Select ... WHERE LogDate = @DateRangeStart

    -- For this week. slightly more complex. Test my date manipulation. You may need to ajust by 1 day, depending when you consider the week to have started. As with the day one, you can DATEADD(ww,... to get weeks in the past

    DECLARE @DateRangeStart DATETIME, @DateRangeEnd DATETIME

    SET @DateRangeStart = DATEADD(ww,DATEDIFF(ww,0,GETDATE()),0)

    SET @DateRangeEnd = DATEADD(dd,6,@DateRangeStart)

    Select ... WHERE LogDate BETWEEN @DateRangeStart AND @DateRangeEnd -- Works because there are no times on these dates.

    -- And for the month. Same construction as for the week. Again, you can DATEADD(mm,... to get weeks in the past

    DECLARE @DateRangeStart DATETIME, @DateRangeEnd DATETIME

    SET @DateRangeStart = DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)

    SET @DateRangeEnd = DATEADD(dd,-1,DATEADD(mm,1,@DateRangeStart))

    Select ... WHERE LogDate BETWEEN @DateRangeStart AND @DateRangeEnd -- Works because there are no times on these dates.

    How does that work?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thank you. I can understand the logic behind it and I'll see how to make it work within Query Analyzer.

    Here's how I'm going to apply this. I have created a series of simple ASPs that basically just have a connection and a query, then Response.Write the output in a table. Seems to work well so far.

    It appears, though, that my queries are getting more involved. Should I be looking to start creating Stored Procedures instead of having the whole query in my ASP? This is for internal use only so I'm not really worried about someone attacking my site and I'm not too concerned about the ASP code being busy provided that by doing so doesn't degrade the performance of the ASP.

    Does that make sense?

    Jeff

  • Jeff Wikle (10/25/2007)


    It appears, though, that my queries are getting more involved. Should I be looking to start creating Stored Procedures instead of having the whole query in my ASP?

    Absolutely. Is good practice, and has a several subtle effects on performance and memory. You probably won't need to bother with those, but is still good practice

    I would suggest 3 seperate stored procs for day, week, month. Alternativly, just one proc and pass the start and end date ranges from the ASP instead of calculating within SQL

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Outstanding! Looks like I have some homework now.

    Thank you so much. You've been a tremendous help.

    Jeff

  • My pleasure. Glad I could help.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply