August 12, 2004 at 9:21 am
Hi,
We have a logging table which is defined by
CREATE TABLE LOG (
LogID uniqueidentifier NOT NULL ,
MessageID uniqueidentifier NOT NULL ,
DatumTijd datetime NOT NULL ,
LogType char (3) NOT NULL ,
Component varchar (256) NOT NULL ,
Class varchar (256) NOT NULL ,
Method varchar (256) NOT NULL ,
MachineName varchar (50) NOT NULL ,
ShortMessage varchar (1000) NOT NULL ,
MessageDetails text NULL ,
InvoerDatumTijd datetime NULL ,
CONSTRAINT PK_LOG PRIMARY KEY NONCLUSTERED
(
LogID
  ON PRIMARY ,
CONSTRAINT FK_LOG_BERICHTEN FOREIGN KEY
(
MessageID
  REFERENCES BERICHTEN (
MessageID
 
) ON PRIMARY TEXTIMAGE_ON PRIMARY
GO
alter table dbo.LOG nocheck constraint FK_LOG_BERICHTEN
GO
CREATE CLUSTERED INDEX IX_Invoerdatumtijd ON dbo.LOG(InvoerDatumTijd) ON PRIMARY
GO
I deleted the collation attributes for readability, they're all SQL_Latin1_General_CP1_CI_AS
We notice that the duration for a straightforward insert - 1 record at a time - increases linear with the amount of records. When the table contains over 4,000,000 records, an insert takes more than 30 seconds, at which point the .Net SqlCommand object times out (default timeout, not an option to change this in code). The transaction isolation mode is set to "read commited".
Profiler tells us that the insert is a "Clustered index insert".
This is not my design, but I've noticed:
Now my question: is it inevitable that inserts on a large table with a clustered index gradually take more time as the table grows, or is there a flaw in the design which will prevent the increase in time or at least make this non-linear?
Any help will be greatly appreciated.
Cheers,
Henk
August 12, 2004 at 10:09 am
First I'd just try a non-clustered index since it's a log table. But my guess is that the same issue will occur. More likely I'd modify the uniqueidentifier datatype to something like "bigint identity" or "binary identity", which are always ascending, thus keeping the 'hot spot' of your insert only table at the end. That 'uniqueidentifier' is just that, not guarenteed to be ascending. maybe your clustered index is splitting ? Maybe you are table scanning the index on every insert looking for an open slot ?
August 13, 2004 at 10:12 am
Assuming your application is using ADO to insert data, make sure inserts are done with disconnected recordsets. I've had cases where developer would just open a table and insert a record in one connection. This doesn't work in real life with large tables like yours.
August 15, 2004 at 8:23 pm
Would "InvoerDatumTijd" have the same value for every row in the batch ? I have seen high-ratio keys cause problems in other databases, but I thought SQLServer handled it, maybe not. Make it nonclustered, and insert another column to make it more unique, eg
(InvoerDatumTijd, LogID)
If InvoerDatumTijd has few distinct values, it might not be worth having an index at all. Each distinct value must select <5% of the table to be useful.
August 16, 2004 at 10:59 pm
Henk,
The problem with CLUSTERED indexes is that they live in the data and should just not be used for transactional tables. Inserting into the "middle" of a 4 million record table will cause all of the data to "shift down" so the new data can be inserted, with the clustered index, in the proper spot in the data.
I think you'd be much better of using a non-clustered index. If the table is highly transactional, you may want to set the FILL FACTOR for the index to something low like 40 or 50%. Then, you'll need to schedule some maintanence time for DBCC REINDEX.
And do use indexes on tables of that size... don't go nuts, though, because too many indexes can bog your code down, as well. Let the Index Tuning Wizard do some work for some of your queries... I've found that it's one of the few things that Microsoft actually did right (dunno, they may have bought it from somebody)
--Jeff Moden
Change is inevitable... Change for the better is not.
August 17, 2004 at 6:36 am
Hi all,
First: thank you all for your help, you've given me some usefull information. Reading the previous posts, I also realized that I left out some important information, which I will provide now:
I'm starting to believe that there was a (dead?) locking issue, which we solved in the process of creating the shadow tabel and renaming the original and shadow tables and indexes.
Is this possible? I cannot see what could be causing dead-locks in the stored procedure code. I'm also pretty sure that this is the only code that's executed against the table (a bit weird: when using the "Display dependencies..." function in enterprise manager with either the stored procedure or the table does not show the dependency between the table and the stored procedure; is this related to using a reserved word as a tablename?).
I'm also wondering if heavy fragmentation in the table/PK index/text storage was causing the problem. I haven't tried to analyze this as I don't know how.
Any thoughts or suggestions remain welcome.
Cheers,
Henk
Stored procedure code:
CREATE PROCEDURE SAVE_LOG
(@LogID_pr uniqueidentifier,
@MessageID_pr uniqueIdentifier,
@DatumTijd_pr datetime,
@LogType_pr char(3),
@Component_pr varchar(256),
@Class_pr varchar(256),
@Method_pr varchar(256),
@MachineName_pr varchar(50),
@ShortMessage_pr varchar(1000),
@MessageDetails_pr text
)
AS
insert into [LOG]
(LogID, MessageID, DatumTijd, LogType, Component, Class, Method, MachineName, ShortMessage, MessageDetails, InvoerDatumTijd)
values
(@LogID_pr, @MessageID_pr, @DatumTijd_pr, @LogType_pr, @Component_pr, @Class_pr, @Method_pr, @MachineName_pr, @ShortMessage_pr,
@MessageDetails_pr, GetDate())
GO
August 19, 2004 at 8:21 pm
DBCC SHOWCONTIG will provide fragmentation information. Look at the other DBCC commands as well. One does on online index DEFRAG and another does a more offline index rebuild (DBCC REINDEX already mentioned by Jeff Moden). Frankly, I don't think an INSERT into an appending log file should even register time (even 30ms seems very high to me).
Anyhow, I believe the real problem is that you are probably performing a complete index scan with every insert. I'm guessing that despite it being ordered, SQL Server doesn't actually know that the value you are inserting is larger than every other value. Therefore, I think it will run a top down index scan until it reaches the very bottom and says, "oh, I'm appending to the end". The more rows in the table, the longer it takes to scan the index and therefore linear degredation. However, the more rows are put into the table the larger the index gets so when you made the table smaller it gets cached to a greater degree. The times likely vary based on how much, if any, of that clustered index was sitting cached up in RAM at the time of the insert.
As for a solution, turn InvoerDatumTijd into a non-indexed data column (since I expect its actually usefull to keep around) and add a UID with or without a clustered index. SQL Server should know (I'm assuming here) that a new UID will be last and therefore jump to the end of the index immediately. If not, DESC the UID index. I'm not sure if reverse ordering the UID will cause fragmentation, I'm guessing no since it would be faster to push the records down in the current page until full and then start a new page than to create a page split.
Show the execution path of an insert within Query Analyzer. Although I don't recall looking at the path of an Insert before, I'm assuming it will show access path including index usage. If memory serves (I don't have SQL access at the moment) one of the index stats reported is rows examined either explicely or as a percentage. If my beliefs are true it should report that you are scanning the entire index on every insert right now and that converting to a UID will avoid that.
Good luck.
August 20, 2004 at 12:34 am
Lot's of good ideas by lots of good folks... still, I think the clustered index is eating your insert time alive. Try removing all clustered indexes and adding non-clustered indexes in their place (trust me ). If you are going to add lot's of records per week, try using a low fill factor of 60 to 80 and don't forget to reindex the table once a week.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 20, 2004 at 4:10 am
Jeff is absolutely correct.
Test of 20000 inserts on a table using uniqueidentifier as the key.
Clustered index = 6min 20secs
Heap + unique index on uniqueidentifier (fill factor 80%) = 3min 38secs
Heap + unique index on uniqueidentifier (fill factor 50%) = 0mins 41secs
August 20, 2004 at 9:27 am
Hi all,
Thanks again for all your help and suggestions. We will surely try to replace the clustered index with a non clustered. As we know that the datetime value of the "InvoerDatumtijd" column in a inserted record will never be smaller than any existing value, we will keep the fill factor on that index high, but we will set the fill factor on the non clustered primary key index (LogID uniqueidentifier) to something like 50%. We must wait for a suitable moment to make these changes, as this on a heavily used production environment.
At this moment the table contains approximately 1,700,000 records and insert times have increased to up to several seconds.
I'll keep you posted.
Cheers,
Henk
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply