May 26, 2004 at 12:40 pm
An application retrieves data from power monitoring devices and inserts it into a SQL 2000/SP3 database. All data from the devices is contained in four tables – DataLog, DataLogStamp, WaveformLog, EventLog. From time to time it is necessary to archive, and then trim out old data to keep the database to a manageable size. We use a stored procedure (TrimRecords) to do this, but the performance is extremely poor. In addition, the data insertion application is prevented from inserting any new data while TrimRecords is running. Does anyone have any suggestions as to how to rework TrimRecords to improve performance? Is there any way to allow data insertion while old data is being trimmed?
Here is the database schema, including TrimRecords:
/*****************************************************************************/
/************ CREATE TABLES **************************************************/
/*****************************************************************************/
CREATE TABLE [dbo].[DataLog] (
[DataLogStampID] [int] NOT NULL ,
[Value] [float] NULL ,
[QuantityID] [smallint] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[DataLogStamp] (
[ID] [int] IDENTITY (1, 1) NOT NULL ,
[SourceID] [smallint] NOT NULL ,
[TimestampUTC] [datetime] NOT NULL ,
[TimestampSourceLT] [datetime] NOT NULL ,
[FractionOfASecond] [real] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[EventLog] (
[ID] [int] IDENTITY (1, 1) NOT NULL ,
[SourceID] [smallint] NOT NULL ,
[TimestampUTC] [datetime] NOT NULL ,
[TimestampSourceLT] [datetime] NOT NULL ,
[FractionOfASecond] [real] NOT NULL ,
[Priority] [tinyint] NULL ,
[CauseID] [int] NULL ,
[CauseValueID] [int] NULL ,
[EffectID] [int] NULL ,
[EffectValueID] [int] NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[EventString] (
[ID] [int] IDENTITY (1, 1) NOT NULL ,
[String] [nvarchar] (3998) COLLATE Latin1_General_CS_AI NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[EventAcknowledgement] (
[EventLogID] [int] NOT NULL ,
[AckTimeUTC] [datetime] NOT NULL ,
[UserName] [nvarchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Remark] [nvarchar] (2000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Namespace] (
[ID] [smallint] IDENTITY (1, 1) NOT NULL ,
[Name] [nvarchar] (400) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[Description] [nvarchar] (2000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Quantity] (
[ID] [smallint] IDENTITY (10000, 1) NOT NULL ,
[Name] [nvarchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[Base] [nvarchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Unit] [nvarchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Phase] [nvarchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Type] [nvarchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Direction] [nvarchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Misc] [nvarchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Description] [nvarchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Source] (
[ID] [smallint] IDENTITY (1, 1) NOT NULL ,
[Name] [nvarchar] (250) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[NamespaceID] [smallint] NOT NULL ,
[SourceTypeID] [smallint] NOT NULL ,
[TimeZoneID] [smallint] NULL ,
[Description] [nvarchar] (2000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Signature] [nvarchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[SourceType] (
[ID] [smallint] IDENTITY (1, 1) NOT NULL ,
[Name] [nvarchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[TimeZone] (
[ID] [smallint] IDENTITY (1, 1) NOT NULL ,
[DisplayName] [nvarchar] (400) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[Bias] [smallint] NOT NULL ,
[DaylightBias] [smallint] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[WaveformLog] (
[ID] [int] IDENTITY (1, 1) NOT NULL ,
[SourceID] [smallint] NOT NULL ,
[TimestampUTC] [datetime] NOT NULL ,
[TimestampSourceLT] [datetime] NOT NULL ,
[FractionOfASecond] [real] NOT NULL ,
[QuantityID] [smallint] NOT NULL ,
[TimeOfTriggerUTC] [datetime] NOT NULL ,
[TriggerFracOfASec] [real] NOT NULL ,
[TimeOfFirstPointUTC] [datetime] NOT NULL ,
[FirstFracOfASec] [real] NOT NULL ,
[SamplingFrequency] [float] NULL ,
[Scale] [float] NULL ,
[Offset] [int] NULL ,
[DataPointType] [nvarchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Samples] [varbinary] (7900) NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Registry] (
[Machine] [nvarchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Name] [nvarchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[Value] [nvarchar] (1000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ReadOnly] [bit] NOT NULL ,
[Description] [nvarchar] (1000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[PhaseLabelType] (
[ID] [tinyint] IDENTITY (1, 1) NOT NULL ,
[PhaseLabels] [nvarchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[Description] [nvarchar] (500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
/*****************************************************************************/
/***************** CHECKS AND CONSTRAINTS ************************************/
/*****************************************************************************/
-- The order here matters! Some constraints depend on others, notably the
-- foreign key constraints. Create the primary keys and independed indexes
-- first, then the defaults, then the foreign keys.
/********** PRIMARY KEYS AND INDEXES **********/
CREATE CLUSTERED INDEX [CIX_DataLogStamp_SourceIDTimestampUTCFractionOfASecond] ON [dbo].[DataLogStamp]
([SourceID],[TimestampUTC],[FractionOfASecond]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[DataLogStamp] ADD
CONSTRAINT [PK_DataLogStamp_ID] PRIMARY KEY NONCLUSTERED
([ID]) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [CIX_DataLog_QuantityIDDataLogStampID] ON [dbo].[DataLog]
([QuantityID], [DataLogStampID]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[EventString] WITH NOCHECK ADD
CONSTRAINT [PK_EventString] PRIMARY KEY CLUSTERED
([ID]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[EventLog] WITH NOCHECK ADD
CONSTRAINT [PK_EventLog_ID] PRIMARY KEY NONCLUSTERED
([ID]) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [CIX_EventLog] ON [dbo].[EventLog]
([TimestampUTC], [FractionOfASecond]) ON [PRIMARY]
GO
CREATE INDEX [IX_EventAcknowledgement] ON [dbo].[EventAcknowledgement]
([EventLogID]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Namespace] WITH NOCHECK ADD
CONSTRAINT [PK_Namespace_ID] PRIMARY KEY CLUSTERED
([ID]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Namespace] WITH NOCHECK ADD
CONSTRAINT [IX_Namespace_Name] UNIQUE NONCLUSTERED
([Name]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Quantity] WITH NOCHECK ADD
CONSTRAINT [PK_Quantity_ID] PRIMARY KEY CLUSTERED
([ID]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Quantity] WITH NOCHECK ADD
CONSTRAINT [IX_Quantity_Name] UNIQUE NONCLUSTERED
([Name]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Source] WITH NOCHECK ADD
CONSTRAINT [PK_Source_ID] PRIMARY KEY CLUSTERED
([ID]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Source] WITH NOCHECK ADD
CONSTRAINT [IX_Source_Name_NamespaceID] UNIQUE NONCLUSTERED
([Name],[NamespaceID]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[SourceType] WITH NOCHECK ADD
CONSTRAINT [PK_SourceType_ID] PRIMARY KEY CLUSTERED
([ID]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[TimeZone] WITH NOCHECK ADD
CONSTRAINT [PK_TimeZone_ID] PRIMARY KEY CLUSTERED
([ID]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[TimeZone] WITH NOCHECK ADD
CONSTRAINT [IX_TimeZone_DisplayName] UNIQUE NONCLUSTERED
([DisplayName]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[WaveformLog] WITH NOCHECK ADD
CONSTRAINT [PK_WaveformLog_ID] PRIMARY KEY NONCLUSTERED
([ID]) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [CIX_WaveformLog] ON [dbo].[WaveformLog]
([TimestampUTC], [FractionOfASecond]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Registry] WITH NOCHECK ADD
CONSTRAINT [IX_Name] UNIQUE NONCLUSTERED
([Name]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[PhaseLabelType] WITH NOCHECK ADD
CONSTRAINT [PK_PhaseLabelType_ID] PRIMARY KEY CLUSTERED
([ID]) ON [PRIMARY]
GO
ALTER TABLE [dbo].[PhaseLabelType] WITH NOCHECK ADD
CONSTRAINT [IX_PhaseLabelType_PhaseLabel] UNIQUE NONCLUSTERED
([PhaseLabels]) ON [PRIMARY]
GO
/********** CHECK CONSTRAINTS *******/
ALTER TABLE [dbo].[WaveformLog] WITH NOCHECK ADD
CONSTRAINT [CK_WaveformLog_SamplingFreq] CHECK ([SamplingFrequency] >= 0)
GO
ALTER TABLE [dbo].[WaveformLog] WITH NOCHECK ADD
CONSTRAINT [CK_WaveformLog_DataPointType] CHECK
([DataPointType] = 'int-16' or [DataPointType] = 'int-32'
or [DataPointType] = 'float-32' or [DataPointType] = 'float-64'
or [DataPointType] = 'uint-16' or [DataPointType] = 'uint-32'
or [DataPointType] = 'int-8' or [DataPointType] = 'uint-8')
GO
/********** DEFAULTS **********/
ALTER TABLE [dbo].[DataLogStamp] WITH NOCHECK ADD
CONSTRAINT [DF_DataLogStamp_FractionOfASecond] DEFAULT (0.0) FOR [FractionOfASecond]
GO
ALTER TABLE [dbo].[EventLog] WITH NOCHECK ADD
CONSTRAINT [DF_EventLog_FractionOfASecond] DEFAULT (0.0) FOR [FractionOfASecond]
GO
ALTER TABLE [dbo].[WaveformLog] WITH NOCHECK ADD
CONSTRAINT [DF_WaveformLog_FractionOfASecond] DEFAULT (0.0) FOR [FractionOfASecond],
CONSTRAINT [DF_WaveformLog_SamplingFrequency] DEFAULT (0) FOR [SamplingFrequency],
CONSTRAINT [DF_WaveformLog_Scale] DEFAULT (0) FOR [Scale],
CONSTRAINT [DF_WaveformLog_Offset] DEFAULT (0) FOR [Offset]
GO
ALTER TABLE [dbo].[Registry] WITH NOCHECK ADD
CONSTRAINT [DF_ReadOnly] DEFAULT (0) FOR [ReadOnly]
GO
ALTER TABLE [dbo].[EventAcknowledgement] ADD
CONSTRAINT [DF_EventAcknowledgement_TimestampUTC] DEFAULT (getutcdate()) FOR [AckTimeUTC]
GO
/********** FOREIGN KEYS AND REFERENCES **********/
ALTER TABLE [dbo].[DataLogStamp] ADD
CONSTRAINT [FK_DataLogStamp_Source] FOREIGN KEY
([SourceID]) REFERENCES [dbo].[Source] ([ID])
GO
ALTER TABLE [dbo].[DataLog] ADD
CONSTRAINT [FK_DataLog_DataLogStamp] FOREIGN KEY
([DataLogStampID]) REFERENCES [dbo].[DataLogStamp] ([ID]),
CONSTRAINT [FK_DataLog_Quantity] FOREIGN KEY
([QuantityID]) REFERENCES [dbo].[Quantity] ([ID])
GO
ALTER TABLE [dbo].[EventLog] ADD
CONSTRAINT [FK_EventLog_Source] FOREIGN KEY
([SourceID]) REFERENCES [dbo].[Source] ([ID]),
CONSTRAINT [FK_EventLog_EventString_CauseID] FOREIGN KEY
([CauseID]) REFERENCES [dbo].[EventString] ([ID]),
CONSTRAINT [FK_EventLog_EventString_CauseValueID] FOREIGN KEY
([CauseValueID]) REFERENCES [dbo].[EventString] ([ID]),
CONSTRAINT [FK_EventLog_EventString_EffectID] FOREIGN KEY
([EffectID]) REFERENCES [dbo].[EventString] ([ID]),
CONSTRAINT [FK_EventLog_EventString_EffectValueID] FOREIGN KEY
([EffectValueID]) REFERENCES [dbo].[EventString] ([ID])
GO
ALTER TABLE [dbo].[EventAcknowledgement] ADD
CONSTRAINT [FK_EventAcknowledgement_EventLog] FOREIGN KEY
([EventLogID]) REFERENCES [dbo].[EventLog] ([ID])
GO
ALTER TABLE [dbo].[Source] ADD
CONSTRAINT [FK_Source_Namespace] FOREIGN KEY
([NamespaceID]) REFERENCES [dbo].[Namespace] ([ID]),
CONSTRAINT [FK_Source_SourceType] FOREIGN KEY
([SourceTypeID]) REFERENCES [dbo].[SourceType] ([ID]),
CONSTRAINT [FK_Source_TimeZone] FOREIGN KEY
([TimeZoneID]) REFERENCES [dbo].[TimeZone] ([ID])
GO
ALTER TABLE [dbo].[WaveformLog] ADD
CONSTRAINT [FK_WaveformLog_Quantity] FOREIGN KEY
([QuantityID]) REFERENCES [dbo].[Quantity] ([ID]),
CONSTRAINT [FK_WaveformLog_Source] FOREIGN KEY
([SourceID]) REFERENCES [dbo].[Source] ([ID])
GO
/*****************************************************************************/
/***************** CREATE STORED PROCEDURES **********************************/
/*****************************************************************************/
/***********************************************************
TrimRecords: optionally trim Datalog,waveforms,
events.
Optional Time parameter
Determines time of day to trim to
If not passed, then midnight local time will be used.
If passed it must be local time in ODBC format.
Calls LocalToUTCDate stored procedure
**********************************************************/
CREATE PROCEDURE TrimRecords
(
@DaysToKeep SMALLINT=30,
@TrimDataLog BIT=1,
@TrimWaveForm BIT=1,
@TrimEvent BIT=1,
@Time NVARCHAR(255) = '00:00:00'
)
AS
SET NOCOUNT ON
IF (@DaysToKeep <= 0)
BEGIN
TRUNCATE TABLE DataLog
DELETE FROM DataLogStamp
TRUNCATE TABLE WaveformLog
DELETE FROM EventAcknowledgement
DELETE FROM EventLog
DELETE FROM EventString
END
ELSE
BEGIN
DECLARE @StartDate NVARCHAR(255)
DECLARE @LocalNowDate DateTime
Set @LocalNowDate = GetDate()
Set @StartDate = LTRIM(YEAR(@LocalNowDate))+'-'+RIGHT(('0'+LTRIM(MONTH(@LocalNowDate))),2)+'-'+RIGHT(('0'+LTRIM(DAY(@LocalNowDate))),2)+' '+@Time
Set @StartDate = DATEADD(dd, - @DaysToKeep,@StartDate)
Exec LocalToUTCDate @StartDate, @StartDate OUTPUT
--PRINT @StartDate
DECLARE @whereSQL nvarchar(200)
SET @whereSQL = ' WHERE TimestampUTC < '''+@StartDate+''''
IF (@TrimDataLog = 1)
BEGIN
CREATE TABLE #T1(DLSid INT)
EXEC('INSERT INTO #T1 Select ID FROM DataLogStamp'+@whereSQL)
DELETE FROM DataLog WHERE DataLog.DataLogStampID IN (Select DLSid FROM #T1)
DELETE FROM DataLogStamp WHERE DataLogStamp.ID IN (Select DLSid FROM #T1)
DROP TABLE #T1
END
IF (@TrimWaveForm = 1)
BEGIN
EXEC ('DELETE FROM WaveformLog ' + @whereSQL)
END
IF (@TrimEvent = 1)
BEGIN
CREATE TABLE #T2(EventId INT)
EXEC('INSERT INTO #T2 Select ID FROM EventLog'+@whereSQL)
DELETE FROM EventAcknowledgement WHERE EventLogID IN (Select EventId FROM #T2)
--following NOT IN clause for when user acknowledged event between table deletes
--would otherwise violate foreign key constraint
DELETE FROM EventLog WHERE EventLog.ID IN (Select EventId FROM #T2)AND EventLog.Id NOT IN (Select EventAcknowledgement.EventLogID from EventAcknowledgement)
DROP TABLE #T2
END
END
RETURN 0
GO
May 27, 2004 at 10:33 am
Ah, purging... I see several possible approaches. One is to forget purging, and periodically switch all your writes to a new database. Assuming that is not an option, there are several reasons why things are slow, and several corresponding approaches. In general, deletes using large joins are just inefficient. One alternative which keeps the current structure would be to do something like:
while exists (select * from #T1)
begin
select top 100 * into #T2 from #T1
-- delete using #T2.......
delete #T1 where ID in (select ID from #T2)
drop table #T2
waitfor delay '00:00:01' -- let others insert
end
This will allow other processes access to the tables for 1 second between every 100 deletes, (or 1000, or whatever number doesn't take too long). The deletes themselves may or may not be faster, depending on whether the server uses a different execution plan because the join with #T2 is so much smaller than the join with #T1, not sure.
If this tactic is not sufficient, you have some structural changes to make. One basic approach would be to use "on delete cascade" clauses in the foreign key constraints and delete only parent rows, using a loop with a waitfor as above. When foreign key constraints are present, the server WILL check for "child" records, which takes considerable time - might as well have it delete them during the check.
To really speed things up though, you may need to forego some referential integrity enforcement on the server side (hopefully your client programs are free from referential errors) and restructure your indexing so that deletes can be performed without a join, using just an index (preferably clustered) with the TimestampUTC field as the first column in the index. For example, your index on datalogstamp that has sourceid as the first column is useless if a statement does not supply any sourceid(s), and if there are fewer than two dozen or so sourceid(s) existing in the data, the index will NEVER be chosen for use by the optimizer...
Try using the waitfor technique first, which will be needed in order to allow other processes concurrent access, no matter which approach you take.
Hope this helps - fred dot williams at sqlsavior dot com
May 27, 2004 at 11:26 am
P.S. Make sure every foreign key pointing to a table subject to purging has a corresponding index on the foreign key columns. If no such index is present, a table scan is likely to result every time a parent record is deleted. Unlike primary key constraints, foreign key constraints do not automatically create a corresponding index...
May 27, 2004 at 2:30 pm
The biggest problem I saw with your schema is lack of indexes. Definitely index the columns that are referenced by Foreign Keys, but I'll bet there are more besides.
cl
Signature is NULL
June 6, 2004 at 9:51 pm
You can disable the constraint checking on a table before deleting from it, then re-enable afterwards:
ALTER TABLE tablename NOCHECK CONSTRAINT constraintname
--or-- ALTER TABLE tablename NOCHECK CONSTRAINT ALL
-- "CHECK" to enable.
June 9, 2004 at 12:13 am
There are a number of approaches that you can follow, as the others mentioned. A couple of rules though:
1. Have a clustered index on your primary key - I have done much testing, and deletes are much much (i.e. order of magnitude) faster than with a non clustered key/index. (So you must have an index, and make sure its a clustered index).
2. Don't disable constraints. Constraints are there for a reason, and if you have an index on the referenced tables, it should not slow performance too much.
3. Batch deletes. This was also mentioned by other posters. In my experience, doing deletes 100 at a time is not sufficient. Rather do them 1000 at a time. Be careful with increasing this though - although the deletes will be faster, remember that increasing this number increases the log file usage.
You can do this fairly easily by using the following statement:
SET ROWCOUNT 1000
declare @rows INT
set @rows = 1000
while (@rows = 1000)
begin
delete from table
set @rows = @@rowcount
end
4. Play around with different statements, and check the query plans. This often highlights table scans and index scans which should not be occuring very often.
5. If you still have performance issues, try horizontal partitioning of the data. So you have a seperate table for each months data, and a view over all the data. Then, when it comes time to delete old data, drop the old table and redefine your view.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply