Audit Table - Indexes

  • I am looking at optimising the table below. The database instance is  SQL server,  hosted  in AWS. there are no indexes or primary keys on this table. The table is not linked to any others, so there are no foreign keys either as they as not required at the moment.

    However, performance is very poor, particularly when running queries that  filter on the [record_create_time] Field.

    So in the first instance I was wondering if you could  advise on the inclusion / creation of  indexing on the [record_create_time] Field please, I have made this suggestion before but was informed that this would probably not improve performance.  As an example of a query we may run  is

    select Request_id, message_id from audit where record_create_time > '2019-09-02' and record_create_time < '2019-09-03’

    For instance, this query would run for over 6 minutes and would return about 40,000 records.  The problem we also have is that while support are running queries against this table, App is still trying to perform inserts an get connections, eventually the inserts gets suspended and the number of connections is exceeded.

    The current number of records is  ~ 12,599,332. Any thoughts, help  or advice would be greatly appreciated.

    This is the table (script)

    CREATE TABLE [dbo].[audit](

    [ID] [bigint] IDENTITY(1,1) NOT NULL,

    [request_id] [varchar](50) NULL,

    [message_id] [varchar](50) NULL,

    [correlation_id] [varchar](50) NULL,

    [log_type] [varchar](20) NOT NULL,

    [instance_thread] [varchar](1000) NOT NULL,

    [log_message] [varchar](max) NULL,

    [record_create_time] [datetime] NOT NULL,

    [message_receive_time] [datetime] NOT NULL,

    [flow_status] [smallint] NOT NULL

    )

    GO
  • Create a PK on all tables. Really, do this. There are exceptions, but 99.99% of the time, just make sure a PK exists.

    ALTER TABLE dbo.audit ADD CONSTRAINT auditPK PRIMARY KEY (ID)

    An index will help performance. Get an idea of which columns are used in WHERE clauses, and you might find a few indexes to add. For sure, you can do this:

    CREATE INDEX Audit_NDX_RecordCreateTime ON dbo.audit (record_create_time)

    I'd get the execution plan before you create this and after, and compare. Likely this will show you how SQL Server seees the queries, and you should see some better performance. However, you might actually try this:

    CREATE INDEX Audit_NDX_RecordCreateTime_Include_MsgID_ReqID
    ON dbo.audit (record_create_time)
    INCLUDE (
    message_id
    , request_id
    );

    This would perform better.

    I could also see you want a nonclustered index on message_id or request_id separately.

  • Thank your for your response Steve. Will do and let you know the outcome. Thanks again.

  • First, be VERY aware that if you add the recommended PK (and you absolutely should), it WILL make a copy of the entire table before dropping the original heap (this occurs for any index > 128 extents, which is just 8MB).  This is a fairly large table and you need to make sure that the underlying MDF file 1) has enough space to do that and 2) you don't mind having that much free-space allocated to the table when the original heap is dropped.

    Second, be very aware that if you add the recommended PK while in the FULL recovery model, that your log file is going to take a major hit and, possibly, also expand much more than you planned on.  What you might want to do to avoid some of this is make a trip to the BULK LOGGED recovery model to do your PK (a Clustered Index) and Non-Clustered Indexes (yeah... you should add these, as well) and then return to the FULL recovery model.  Index builds are "minimally logged" in the BULK LOGGED and SIMPLE recovery models (don't use SIMPLE for this... it WILL break the log file chain) whereas they are fully logged in the FULL Recovery Model.

    Shifting gears a bit, even if you properly index the table (and you absolutely should), you have the following column....

    [log_message] [varchar](MAX) NULL,

    ... and that can cause massive issues with "page density" in that SQL Server defaults to storing the LOBS "In Row if they fit" and THAT can result in huge numbers of "Trapped Short Rows" resulting in many, many pages having a page density as low as 3% (wasting 97% of your memory when the pages are loaded, for example).

    To see if that problem may be present, please execute the following code and report the results, please.

     SELECT  MinLen = MIN(DATALENGTH(log_message))
    ,AvgLen = AVG(DATALENGTH(log_message))
    ,MaxLen = MAX(DATALENGTH(log_message))
    ,Len0K = SUM(CASE WHEN LEN(log_message)=0 OR log_message IS NULL THEN 1 ELSE 0 END) --Intentionally different here.
    ,Len1k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 1 AND 1000 THEN 1 ELSE 0 END)
    ,Len2k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 1001 AND 2000 THEN 1 ELSE 0 END)
    ,Len3k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 2001 AND 3000 THEN 1 ELSE 0 END)
    ,Len4k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 3001 AND 4000 THEN 1 ELSE 0 END)
    ,Len5k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 4001 AND 5000 THEN 1 ELSE 0 END)
    ,Len6k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 5001 AND 6000 THEN 1 ELSE 0 END)
    ,Len7k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 6001 AND 7000 THEN 1 ELSE 0 END)
    ,Len8k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 7001 AND 8000 THEN 1 ELSE 0 END)
    ,LenOOR = SUM(CASE WHEN DATALENGTH(log_message) > 8000 THEN 1 ELSE 0 END)
    ,Trail = SUM(CASE WHEN ISNULL(DATALENGTH(log_message),0)<>ISNULL(LEN(log_message),0) THEN 1 ELSE 0 END)
    FROM dbo.audit
    ;

    • This reply was modified 5 years, 3 months ago by  Jeff Moden. Reason: Made some minor corrects caught by Des Norton below. Thanks for the catches

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    To see if that problem may be present, please execute the following code and report the results, please.

     SELECT  MinLen = MIN(DATALENGTH(log_message))
    ,AvgLen = AVG(DATALENGTH(log_message))
    ,MaxLen = AVG(DATALENGTH(log_message))
    ,Len0K = SUM(CASE WHEN LEN(log_message)=0 OR log_message IS NULL THEN 1 ELSE 0 END) --Intentionally different here.
    ,Len1k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 1 AND 1000 THEN 1 ELSE 0 END)
    ,Len2k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 1001 AND 2000 THEN 1 ELSE 0 END)
    ,Len3k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 2001 AND 3000 THEN 1 ELSE 0 END)
    ,Len4k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 3001 AND 4000 THEN 1 ELSE 0 END)
    ,Len5k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 4001 AND 5000 THEN 1 ELSE 0 END)
    ,Len6k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 5001 AND 6000 THEN 1 ELSE 0 END)
    ,Len7k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 6001 AND 7000 THEN 1 ELSE 0 END)
    ,Len8k = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 7001 AND 8000 THEN 1 ELSE 0 END)
    ,LenOOR = SUM(CASE WHEN DATALENGTH(log_message) > 8000 THEN 1 ELSE 0 END)
    ,Trail = SUM(CASE WHEN ISNULL(DATALENGTH(log_message),'')<>ISNULL(LEN(log_message),'') THEN 1 ELSE 0 END)
    FROM dbo.audit
    ;

    Jeff

    I believe that

    ,MaxLen = AVG(DATALENGTH(log_message))

    is a typo, and should in fact be

    ,MaxLen = MAX(DATALENGTH(log_message))

     

    I am trying to understand the rationale of this piece

    ,Trail  = SUM(CASE WHEN ISNULL(DATALENGTH(log_message),'')<>ISNULL(LEN(log_message),'') THEN 1 ELSE 0 END)

    Both DATALENGTH and LEN return int values.  Is there a specific reason for supplying an empty string for the 2nd parameter of ISNULL() instead of 0?

  • I'm not sure how many other queries you do on that table, but if this is the main query you would get much better performance by creating a unique clustered index on columns (record_create_time, ID).

  • @desnorton,

    Correct.  The Copy'n'Paste  error for AVG vs MAX you pointed out needs to be fixed/ changed to MAX.

    On the "Trail" formula, good eyes on your part.  The empty string notation should have all been a 0's even though it works as is.  I made a last minute change and didn't double check the form of the code.  Thanks for the catch especially since the catch eliminates an implicit conversion.

    I've updated the code I posted.  Again, thanks for the catches.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Heh... there must be an "Audit" disturbance in the "Force".  We have another AUDIT table with a LOB in it.  Since the OP for this thread hasn't returned, I'm just going to post the link to my answer on a different thread by a different OP.

    https://www.sqlservercentral.com/forums/topic/index-question-21#post-3677826

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply