June 9, 2010 at 12:10 pm
I have a stored procedure that runs a remote query to fill a temp table. It is by far the most IO and CPU intensive query on my server. Hopefully someone can help me understand how to optimize this query. The remote table has a bit over 3 million rows. Looking at the execution plan the remote scan is 97% of the query cost. I'm thinking that I should be able to select just a subset of that table rather than the whole thing but I am still trying to decipher the CTE (I'm still relatively new to them and find their syntax baffling). Thank you for your help.
Nathan
This is the remote table definition
USE [usascada]
GO
/****** Object: Table [dbo].[FloatTable] Script Date: 06/09/2010 10:04:17 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[FloatTable](
[DateAndTime] [datetime] NULL,
[Millitm] [smallint] NULL,
[TagIndex] [smallint] NULL,
[Val] [float] NULL,
[Status] [varchar](1) NULL,
[Marker] [varchar](1) NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
USE [usascada]
GO
/****** Object: Index [FloatTableIndex] Script Date: 06/09/2010 10:04:52 ******/
CREATE CLUSTERED INDEX [FloatTableIndex] ON [dbo].[FloatTable]
(
[DateAndTime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 70) ON [PRIMARY]
USE [usascada]
GO
/****** Object: Index [TagIndex] Script Date: 06/09/2010 10:05:15 ******/
CREATE NONCLUSTERED INDEX [TagIndex] ON [dbo].[FloatTable]
(
[TagIndex] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Here is the query
declare @sd datetime;
select @sd = dateadd([day], datediff([day], '19000101', getdate()) - 1, '19000101');
-- create temp tagData table
CREATE TABLE #tagData (
[DateAndTime] [datetime] NULL ,
[Millitm] [smallint] NULL ,
[TagIndex] [smallint] NULL ,
[Val] [float] NULL ,
[Status] [char] (1) NULL ,
[Marker] [char] (1) NULL );
INSERT INTO #tagData SELECT * FROM OPENQUERY([scada-02],'SELECT * FROM usascada.dbo.FloatTable');
WITH cteSequence AS
( SELECT TagIndex, DateAndTime, Millitm, Val, ROW_NUMBER() OVER(PARTITION BY TagIndex ORDER BY DateAndTime, Millitm DESC) as row
FROM #tagData),cteSeqDups AS
( SELECT *FROM cteSequence s1WHERE EXISTS
( SELECT * FROM cteSequence s2 WHERE s2.TagIndex = s1.TagIndex AND s2.row = s1.row - 1 AND s2.Val = s1.Val)
)
DELETE FROM cteSeqDups WHERE DateAndTime < @sd;
INSERT INTO scada_archive.dbo.ScadaArchive SELECT * FROM #tagData WHERE DateAndTime < @sd;
June 9, 2010 at 1:54 pm
This is the cause of the Remote Scan, not the CTE:
INSERT INTO #tagData SELECT * FROM OPENQUERY([scada-02],'SELECT * FROM usascada.dbo.FloatTable');
What I'd suggest is putting some filters into that SELECT statement to limit the amount of data being moved across the system. That's going to give you the biggest hit.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
June 9, 2010 at 1:56 pm
That is what I figured however in order to know what filter to apply to the select statement I have to know what the CTE is doing to make sure I get the right records.
Nathan
January 14, 2011 at 7:06 am
In this line of code: INSERT INTO scada_archive.dbo.ScadaArchive SELECT * FROM #tagData WHERE DateAndTime < @sd;
you filter on DateAndTime... just filtering your remote table on that will return fewer records.
Is there an index on the DateAndTime column? If not, add one, it will help as well.
January 14, 2011 at 7:14 am
bleister88 (1/14/2011)
In this line of code: INSERT INTO scada_archive.dbo.ScadaArchive SELECT * FROM #tagData WHERE DateAndTime < @sd;you filter on DateAndTime... just filtering your remote table on that will return fewer records.
Is there an index on the DateAndTime column? If not, add one, it will help as well.
The where will definitely, maybe help. But the index will most likely not help unless it's clustered. Assuming you're getting all the data before midnight today, then you're going to get almost all the data in the table anyways, which renders and extra index useless.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply