What to do to improve a table?

  • We have in our database a table that serves as a log for the server administrators.

    The server administrators run audits and send the output of the audit runs to be stored in this table.

    It is growing fast and I am looking into ideas of how to improve it.

    I can set an archive job to archive rows that are n days old.

    What about indexes? That table is queried a lot by customer and by server name. That table currently has no indexes, no primary keys.

    The only 2 activities that happen on that table are:

    1. insert audit output by customer and server name

    2. query the audit output to display dashboard statuses

    Columns on that log table:

    server varchar (50)

    customer varchar(50)

    audit_output ntext

    timestamp datetime

    Any suggestions will be greatly appreciated.

  • Create a non-clustered index on columns server, customer and add audit_output as include column.

    --Ramesh


  • As first suggestions... These may not be completely optimal, as I don't hav your queries, I don't know the data distribution and I can't see the exec plans

    Clustered index on timestamp

    nonclustered index on server, customer

    nonclustered index on customer, INCLUDE server

    The two NC depends. Are queries done by customer name or by server, or are there queries that filter on both? I'm assuming the latter.

    Change the data type of audit_output to nvarchar(max) as its the 2005 replacement for ntext

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ramesh (12/7/2007)


    Create a non-clustered index on columns server, customer and add audit_output as include column.

    audit_output is ntext, and as such can't be used as an include column.

    Even if changed to nvarchar(max) (if it can can) I don't think I'd recommend adding it to the include, as adding a blob to the include of an index results in it been stored twice (the entire blob, not the pointer to it)

    Edit: I can't spell or type today

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • How could I miss that?:hehe:

    Thanks Gail for the catch. I do agree with what you have suggested.

    Adding included varchar(max), nvarchar(max), varbinary(max), or xml (max) columns will increase the space required, due to the values stored not only in base table but also in the index leaf pages. This may significantly increase the time required to perform DML operations. And also you cannot rebuild such indexes in online mode.

    --Ramesh


  • Thank you both for the suggestions.

    As for queries, they are based either on server or both parms - by customer name and server.

    I am a novice and have a minimal knowledge of sql server indexes.

    Can you point me to an article that can throw some light on what INCLUDE is?

    Thanks again.

  • Include adds data to the index so that a bookmark lookup is not needed (now a clustered index seek) to go back to the table for the data. Good for wider tables, not sure it's good here. Not enough data in the table.

    Does you query often use the audit_output in the result set? Or do you return the other data and then let the user select the row to see the audit_output? Is there ordering (like by date)?

    The reason I ask is that it will change what I'd recommend. If you order by date, or query by date, I would tend to recommend this as the clustered index. For two reasons.

    - One is that the data will be in order and doesn't need to be sorted, saving tempdb space, could limit the searches by the query optimizer.

    - No page splits as you add data. You tend to add in date order (it's an audit), so the rows always go to the end of the last page/extent, or a new one. This could cause a hot spot in a high volume system, but I think that's better than tons of page splits, which can cause fragmentation.

    A non-clustered on server/customer if you include that in the WHERE clause often, then these can go through a seek quickly. If you include the datetime, I might even add that as a third column or an include.

    If you pull back some of the data, like 100 char or so of the audit_output, then I'd add another column to the table, maybe even a computed column, and put that data (first 100 char) in it. Then use that as an INCLUDE in your index.

    You want smallest indexes so they scan quickly as a general rule. However you want them to be effective.

  • Hi Steve,

    Does you query often use the audit_output in the result set? Yes

    These are the queries(from most frequent to least frequent):

    1. Administrators sit at their computers, run the audits on selected servers,

    ( which causes an insert to the log table) and view the audit_output to find out what

    their next steps will be. This insert causes inserts and updates on another table

    called dashboard table.

    The dashboard table is populated by "scanning" the audit_output text for statuses

    (failures, success) etc... The scanning is performed with Charindex looking for specific strings.

    This dashboard table is populated in a cumulative manner

    (ie Monday: audit hits item1, item2 dashboard is updated for item1 and item2,

    Wednesday: audit hits item5, dashboard is updated for item5, Thursday: audit hits items 10-14, dashboard table gets updated for items 10-14, etc...) There are 14 items on the dashboard table( all data type int mapping to a Red/Yellow/Green status).

    2. Customer representatives query by customers and servers that belong to that customer by looking at the dashboard ( aggregate by customers/servers). Occasionally, they may be interested in the finer details of the statuses so they can drill through a web page that display the audit_output history( ie by timestamp in desc order)

    One thing about the audit_ouput: it is large because the administrators pull a lot of server data. Sometimes it is voluminous because of errors in coding(infinite loops!!!).

    For that reason, I think keeping the indexes small makes sense.

    Thanks

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply