How to fix a non-indexed huge table?

  • Before we all get too worked up about the LDF and MDF sizes, what percentage is actually utilized for the files?

  • quote

    Yes like bnordberg said this is from a system generated log file.

    Then, the bottom line is, you probably shouldn't change it by adding an IDENTITY column to it... it could break whatever code is feeding it.  The best you'll be able to do is index it properly.  Man... hard to believe that a "system generated log file" would be setup to violate the 8060 rule...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Heh... "Worked up" isn't the point... the real fact is, it shouldn't have happened that way and they shouldn't be in the position where they have to shrink it... of course, if they don't shrink it, they'll never have to worry about an auto-growth delay ever again

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • OK, I got the query result:

    select max(len([cs-uri-query])) as [max-cs-uri-query], max(len([cs(User-Agent)])) as [max-cs-User-Agent], max(len([cs(Referer)])) as [max-cs-Referer] from iislog

    took almost 28 minutes to finish on server this time.

    What does this mean? why performance slowed down after index added? not reasonable.

  • I really don't think you will get too much performance increase out of indexes on this table. Mainly because, except for date and time, most of the values in the fields will contain unique values. There is value in creating a clustered index on unique valued fields - but not much of a performance benefit. Effective indexes are those with a small percentage of duplicated values. I suspect if you index date and start limiting this to date periods your performance will increase - otherwise your just spinning your wheels (or disks).

    Also notice in your example of records from the log file there are lots of gif and jpg hits that are simply images contained on a page. For example if you have 1 page with 50 images on it, IIS will log 51 hits every time this page is loaded. Get rid of the 50 image hits and keep the 1. Your marketing dept may not like this, but it is "truth" (lies, damn lies, statistic and web log analysis - in that order). ETL this data to a new table - if you have pic heavy sites you will drastically reduce your recordset - and increase performance.

  • Hi bnordberg,

    I didn't index date.

    This is the original table schema:

    CREATE TABLE [dbo].[IISlog] (

    [date] [datetime] NULL ,

    [time] [datetime] NULL ,

    [c-ip] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [cs-username] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [s-ip] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [s-port] [smallint] NULL ,

    [cs-method] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [cs-uri-stem] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [cs-uri-query] [varchar] (4096) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [sc-status] [smallint] NULL ,

    [cs(User-Agent)] [varchar] (4096) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [cs(Referer)] [varchar] (4096) COLLATE SQL_Latin1_General_CP1_CI_AS NULL

    ) ON [PRIMARY]

    GO

    This is what I did suggested by David Nash:

    ALTER TABLE IISLog ADD IISLog_id INT IDENTITY CONSTRAINT column_IISLog_id_pk PRIMARY KEY

  • Hi bnordberg,

    I didn't index date.

    This is the original table schema:

    CREATE TABLE [dbo].[IISlog] (

    [date] [datetime] NULL ,

    [time] [datetime] NULL ,

    [c-ip] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [cs-username] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [s-ip] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [s-port] [smallint] NULL ,

    [cs-method] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [cs-uri-stem] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [cs-uri-query] [varchar] (4096) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [sc-status] [smallint] NULL ,

    [cs(User-Agent)] [varchar] (4096) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [cs(Referer)] [varchar] (4096) COLLATE SQL_Latin1_General_CP1_CI_AS NULL

    ) ON [PRIMARY]

    GO

    This is what I did suggested by David Nash:

    ALTER TABLE IISLog ADD IISLog_id INT IDENTITY CONSTRAINT column_IISLog_id_pk PRIMARY KEY

  • I guess I've lost the value you are looking for in creating an index on this table. Since you have no control over the writes to it (IIS is in charge!), what would creating a primary key constraint buy you? It will slow down writes to the table. I assume the table will not be used in any further relational models where a surrogate key could be used in a related table? Creating the primary key will just bloat your table with an extra field and an index that would be useless.

  • Then in this particular case, what would you recommend to do to make it faster when trying to dig something from the database? I thought adding index would benefit that.

  • I agree with Jeff and bnordberg.

    I see no reason to add a primary key/clustered index on this table as it appears to be an automatically generated IIS log file table. It's column sizes are larger and somewhat different than what I've read about the automatically generated IIS log table on http://support.microsoft.com/kb/296085.

    Changing the schema might prevent the logging from occurring (I have no idea of the SQL syntax IIS uses to insert log records).

    Making columns smaller (to get around the row width error) MIGHT still permit logging to occur, but you should test that on another server before performing the modifications.

    Creating an index on the date column is probably the single best index you can do if you need to search the data and get aggregate results.

    Here's a summary of other people's suggestions:

    In IIS:

    Turn off logging of all directories you do not need logged. Like the images folder, cgi folder, etc..This will reduce the growth of the table.

    Turn off logging of other items you don't need:

    -User-Agent is the browser that's accessing your web site. http://en.wikipedia.org/wiki/User_agent lists a few hundred. You can have dozens of versions of IE 6 so aggregating results of user agents from a SQL query is a pain/inefficient. Unfortunately, you might have to keep this column.

    -Referer is the url the user clicked on in order to get directed to your site. Do you need this? If not, don't log it.

    -cs-uri-query are url parameters after the question mark in a url. example: http://www.whatever.com/i.html?product=1&category=2. Don't log it if you don't need it.

    -cs-uri-stem is the directory the browser is accessing after .com/. example: http://www.whatever.com/products/widgets/index.html.

    -s-port is the port the browser is accessing. It's almost always 80 or 443 (for SSL). If you do not have SSL on your site or do not need to know the port, turn logging off.

    You can try deleting old log records in order to reduce query time. You might have several years worth of data in there. Since you have a lot of records, I'd recommend a batch delete as on http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=5&messageid=341605

    Having an index on the date column will greatly help the delete process. Deleting useless records as bnornberg suggested will help too. It's possible that 90% of the records are old or of no use if every http request is logged.

  • as you stated the error message "Warning: The table 'IISLog' has been created but its maximum row size (13625) exceeds the maximum number of bytes per row (8060). INSERT or UPDATE of a row in this table will fail if the resulting row length exceeds 8060 bytes."

    If you calculate your total table fields data type length which is close to 13625 as you have 3 fields of varchar(4096) & top of that bytes per INDEX is 900. consider changing the field length or chances of getting data truncated is very high.

Viewing 11 posts - 31 through 40 (of 40 total)

You must be logged in to reply to this topic. Login to reply