Audit Table - Indexes

Question

Audit Table - Indexes

vamshi.sql

SSCommitted

Points: 1770
More actions
September 3, 2019 at 5:04 pm

#3676955
I am looking at optimising the table below. The database instance is SQL server, hosted in AWS. there are no indexes or primary keys on this table. The table is not linked to any others, so there are no foreign keys either as they as not required at the moment.
However, performance is very poor, particularly when running queries that filter on the [record_create_time] Field.
So in the first instance I was wondering if you could advise on the inclusion / creation of indexing on the [record_create_time] Field please, I have made this suggestion before but was informed that this would probably not improve performance. As an example of a query we may run is
select Request_id, message_id from audit where record_create_time > '2019-09-02' and record_create_time < '2019-09-03’
For instance, this query would run for over 6 minutes and would return about 40,000 records. The problem we also have is that while support are running queries against this table, App is still trying to perform inserts an get connections, eventually the inserts gets suspended and the number of connections is exceeded.
The current number of records is ~ 12,599,332. Any thoughts, help or advice would be greatly appreciated.
This is the table (script)
```
CREATE TABLE [dbo].[audit](

[ID] [bigint] IDENTITY(1,1) NOT NULL,

[request_id] [varchar](50) NULL,

[message_id] [varchar](50) NULL,

[correlation_id] [varchar](50) NULL,

[log_type] [varchar](20) NOT NULL,

[instance_thread] [varchar](1000) NOT NULL,

[log_message] [varchar](max) NULL,

[record_create_time] [datetime] NOT NULL,

[message_receive_time] [datetime] NOT NULL,

[flow_status] [smallint] NOT NULL

)

GO
```
- This topic was modified 5 years, 7 months ago by vamshi.sql.
- This topic was modified 5 years, 7 months ago by Steve Jones - SSC Editor. Reason: formatted code

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 737630 More actions Follow · Answer 1

Create a PK on all tables. Really, do this. There are exceptions, but 99.99% of the time, just make sure a PK exists.

ALTER TABLE dbo.audit ADD CONSTRAINT auditPK PRIMARY KEY (ID)

An index will help performance. Get an idea of which columns are used in WHERE clauses, and you might find a few indexes to add. For sure, you can do this:

CREATE INDEX Audit_NDX_RecordCreateTime ON dbo.audit (record_create_time)

I'd get the execution plan before you create this and after, and compare. Likely this will show you how SQL Server seees the queries, and you should see some better performance. However, you might actually try this:

CREATE INDEX Audit_NDX_RecordCreateTime_Include_MsgID_ReqID
ON dbo.audit (record_create_time)
INCLUDE (
            message_id
          , request_id
        );

This would perform better.

I could also see you want a nonclustered index on message_id or request_id separately.

vamshi.sql SSCommitted Points: 1770 More actions · Answer 2

Thank your for your response Steve. Will do and let you know the outcome. Thanks again.

Jeff Moden SSC Guru Points: 1004577 More actions · Answer 3

First, be VERY aware that if you add the recommended PK (and you absolutely should), it WILL make a copy of the entire table before dropping the original heap (this occurs for any index > 128 extents, which is just 8MB). This is a fairly large table and you need to make sure that the underlying MDF file 1) has enough space to do that and 2) you don't mind having that much free-space allocated to the table when the original heap is dropped.

Second, be very aware that if you add the recommended PK while in the FULL recovery model, that your log file is going to take a major hit and, possibly, also expand much more than you planned on. What you might want to do to avoid some of this is make a trip to the BULK LOGGED recovery model to do your PK (a Clustered Index) and Non-Clustered Indexes (yeah... you should add these, as well) and then return to the FULL recovery model. Index builds are "minimally logged" in the BULK LOGGED and SIMPLE recovery models (don't use SIMPLE for this... it WILL break the log file chain) whereas they are fully logged in the FULL Recovery Model.

Shifting gears a bit, even if you properly index the table (and you absolutely should), you have the following column....

[log_message] [varchar](MAX) NULL,

... and that can cause massive issues with "page density" in that SQL Server defaults to storing the LOBS "In Row if they fit" and THAT can result in huge numbers of "Trapped Short Rows" resulting in many, many pages having a page density as low as 3% (wasting 97% of your memory when the pages are loaded, for example).

To see if that problem may be present, please execute the following code and report the results, please.

 SELECT  MinLen = MIN(DATALENGTH(log_message))
        ,AvgLen = AVG(DATALENGTH(log_message))
        ,MaxLen = MAX(DATALENGTH(log_message))
        ,Len0K  = SUM(CASE WHEN LEN(log_message)=0 OR log_message IS NULL THEN 1 ELSE 0 END) --Intentionally different here.
        ,Len1k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN    1 AND 1000 THEN 1 ELSE 0 END)
        ,Len2k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 1001 AND 2000 THEN 1 ELSE 0 END)
        ,Len3k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 2001 AND 3000 THEN 1 ELSE 0 END)
        ,Len4k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 3001 AND 4000 THEN 1 ELSE 0 END)
        ,Len5k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 4001 AND 5000 THEN 1 ELSE 0 END)
        ,Len6k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 5001 AND 6000 THEN 1 ELSE 0 END)
        ,Len7k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 6001 AND 7000 THEN 1 ELSE 0 END)
        ,Len8k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 7001 AND 8000 THEN 1 ELSE 0 END)
        ,LenOOR = SUM(CASE WHEN DATALENGTH(log_message) > 8000 THEN 1 ELSE 0 END)
        ,Trail  = SUM(CASE WHEN ISNULL(DATALENGTH(log_message),0)<>ISNULL(LEN(log_message),0) THEN 1 ELSE 0 END)
   FROM dbo.audit
;

This reply was modified 5 years, 7 months ago by Jeff Moden. Reason: Made some minor corrects caught by Des Norton below. Thanks for the catches

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

DesNorton SSC-Insane Points: 24484 More actions · Answer 4

Jeff Moden wrote:

To see if that problem may be present, please execute the following code and report the results, please.

 SELECT  MinLen = MIN(DATALENGTH(log_message))
        ,AvgLen = AVG(DATALENGTH(log_message))
        ,MaxLen = AVG(DATALENGTH(log_message))
        ,Len0K  = SUM(CASE WHEN LEN(log_message)=0 OR log_message IS NULL THEN 1 ELSE 0 END) --Intentionally different here.
        ,Len1k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN    1 AND 1000 THEN 1 ELSE 0 END)
        ,Len2k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 1001 AND 2000 THEN 1 ELSE 0 END)
        ,Len3k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 2001 AND 3000 THEN 1 ELSE 0 END)
        ,Len4k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 3001 AND 4000 THEN 1 ELSE 0 END)
        ,Len5k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 4001 AND 5000 THEN 1 ELSE 0 END)
        ,Len6k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 5001 AND 6000 THEN 1 ELSE 0 END)
        ,Len7k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 6001 AND 7000 THEN 1 ELSE 0 END)
        ,Len8k  = SUM(CASE WHEN DATALENGTH(log_message) BETWEEN 7001 AND 8000 THEN 1 ELSE 0 END)
        ,LenOOR = SUM(CASE WHEN DATALENGTH(log_message) > 8000 THEN 1 ELSE 0 END)
        ,Trail  = SUM(CASE WHEN ISNULL(DATALENGTH(log_message),'')<>ISNULL(LEN(log_message),'') THEN 1 ELSE 0 END)
   FROM dbo.audit
;

Jeff

I believe that

,MaxLen = AVG(DATALENGTH(log_message))

is a typo, and should in fact be

,MaxLen = MAX(DATALENGTH(log_message))

I am trying to understand the rationale of this piece

,Trail  = SUM(CASE WHEN ISNULL(DATALENGTH(log_message),'')<>ISNULL(LEN(log_message),'') THEN 1 ELSE 0 END)

Both DATALENGTH and LEN return int values. Is there a specific reason for supplying an empty string for the 2nd parameter of ISNULL() instead of 0?

How to post data/code on a forum to get the best help.

Jonathan AC Roberts SSCoach Points: 18406 More actions · Answer 5

I'm not sure how many other queries you do on that table, but if this is the main query you would get much better performance by creating a unique clustered index on columns (record_create_time, ID).

Jeff Moden SSC Guru Points: 1004577 More actions · Answer 6

@desnorton,

Correct. The Copy'n'Paste error for AVG vs MAX you pointed out needs to be fixed/ changed to MAX.

On the "Trail" formula, good eyes on your part. The empty string notation should have all been a 0's even though it works as is. I made a last minute change and didn't double check the form of the code. Thanks for the catch especially since the catch eliminates an implicit conversion.

I've updated the code I posted. Again, thanks for the catches.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004577 More actions · Answer 7

Heh... there must be an "Audit" disturbance in the "Force". We have another AUDIT table with a LOB in it. Since the OP for this thread hasn't returned, I'm just going to post the link to my answer on a different thread by a different OP.

https://www.sqlservercentral.com/forums/topic/index-question-21#post-3677826

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Audit Table - Indexes

Cookies on SQLServerCentral