September 27, 2011 at 10:01 am
We have machines that continuously send messages which are stored in a table. The machine also has different stages of operation (events) which are tracked in a separate sql table with a machine ID, EventGuid, begin and end times.
the problem is that the message does not include the event (EventGuid)
I need to combine these 2 tables so I can link the message to the event on that machine.
My current query uses a between clause in the join condition to join a date to a date range.
Looking for some suggestion to help speed things up current run time on this query is nearly 3 minutes.
both tables have around 700,000 rows (roughly only a 1/5 of the events will link up with messages), the where clause checks for the minimum message date to help filter out event rows that will not have a match.
82% of the execution is shared between 2 Hash Matches
The query is fully covered by the indexes.
here is the statement I have now. We are using SSIS as well so if there is a solution that uses a data flow I'm willing to go that route as well.
UPDATE p
SET p.EventGuid = f.EventGuid
FROM dbo.ParsedMessage p
JOIN dbo.EventReference f
ON p.MachineID = f.MachineID
and p.MessageDateTime BETWEEN f.EventBegin AND f.EventEnd
WHERE p.EventGuid IS NULL
AND f.EventBegin >= (SELECT MIN(MessageDateTime) FROM dbo.ParsedMessage)
September 27, 2011 at 10:33 am
I seriously doubt this is doing anything useful in your query, since the Join math already enforces that rule.
AND f.EventBegin >= (SELECT MIN(MessageDateTime) FROM dbo.ParsedMessage)
Can you post the execution plan?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 27, 2011 at 11:08 am
Ok, I've attached 2 plans, the original (plan1) and a second with the where condition removed (plan2)
September 28, 2011 at 8:12 am
Try these two versions. (Dev/Test environment, of course.) What kind of performance and execution plans do you get with them?
UPDATE dbo.ParsedMessage
SET p.EventGuid = (SELECT EventGuid
FROM dbo.EventReference f
WHERE p.MachineID = f.MachineID
AND p.MessageDateTime BETWEEN f.EventBegin
AND f.EventEnd)
WHERE p.EventGuid IS NULL ;
MERGE INTO dbo.ParsedMessage AS P
USING
(SELECT EventGuid,
EventBegin,
EventEnd
FROM dbo.EventReference) AS F
ON p.MessageDateTime BETWEEN f.EventBegin
AND f.EventEnd
AND p.MachineID = f.MachineID
WHEN MATCHED AND P.EventGUID IS NULL
THEN UPDATE
SET EventGUID = F.EventGUID ;
I'd also like to see table and index definitions. I suspect we could get improved efficiency from some changes to the index structure.
I'm assuming from the forum that you are on SQL 2008. Is that correct? If so, which Edition?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 28, 2011 at 9:47 am
GSquared thanks for helping with this.
We are on 2008 R2 Ent.
here are the table and index defs
CREATE TABLE ADMStage.[dbo].[EventReference] (
[EventGuid] [uniqueidentifier] NOT NULL
,[DateKey] [int] NOT NULL
,[RefNumber] [varchar](4) NOT NULL
,[Sequence] [int] NOT NULL
,[EventBegin] [datetime] NOT NULL
,[EventEnd] [datetime] NOT NULL
,[MachineID] [varchar](6) NULL
,CONSTRAINT [PK_FlightReference] PRIMARY KEY CLUSTERED ([EventGuid] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY])
ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[EventReference] ADD CONSTRAINT [DF_FlightReference_SegmentGuid] DEFAULT (NEWID()) FOR [EventGuid]
CREATE NONCLUSTERED INDEX [idx_Mach_Times] ON [dbo].[EventReference]
(
[MachineID] ASC,
[EventBegin] ASC,
[EventEnd] ASC
)
INCLUDE ( [EventGuid],
[DateKey],
[RefNumber]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE TABLE ADM.[dbo].[ParsedMessage](
[MSCLEUNIK] [int] NOT NULL,
[SMI] [varchar](3) NULL,
[SubSMI] [varchar](8) NULL,
[SubSMI2] [varchar](8) NULL,
[Version] [varchar](8) NULL,
[MessageDateTime] [datetime] NULL,
[MachineID] [varchar](6) NULL,
[RefNumber] [varchar](4) NULL,
[RefCo] [varchar](4) NULL,
[ThisTableCreateDT] [datetime] NOT NULL,
[EventGuid] [uniqueidentifier] NULL,
[MsgField1] [varchar](24) NULL,
[MsgField2] [varchar](24) NULL,
[MsgField3] [varchar](24) NULL,
[MsgField4] [varchar](24) NULL,
[MsgField5] [varchar](24) NULL,
[MsgField6] [varchar](24) NULL,
[MsgField7] [varchar](24) NULL,
[MsgField8] [varchar](24) NULL,
[MsgField9] [varchar](24) NULL,
[MsgField10] [varchar](24) NULL,
[MsgField11] [varchar](24) NULL,
[MsgField12] [varchar](24) NULL,
[MsgField13] [varchar](24) NULL,
[MsgField14] [varchar](24) NULL,
[MsgField15] [varchar](24) NULL,
[MsgField16] [varchar](24) NULL,
[MsgField17] [varchar](24) NULL,
[MsgField18] [varchar](24) NULL,
[MsgField19] [varchar](24) NULL,
[MsgField20] [varchar](24) NULL,
[MsgField21] [varchar](24) NULL,
[MsgField22] [varchar](24) NULL,
[MsgField23] [varchar](24) NULL,
[MsgField24] [varchar](24) NULL,
[MsgField25] [varchar](24) NULL,
[MsgField26] [varchar](24) NULL,
[MsgField27] [varchar](24) NULL,
[MsgField28] [varchar](24) NULL,
[MsgField29] [varchar](24) NULL,
[MsgField30] [varchar](24) NULL,
[MsgField31] [varchar](24) NULL,
[MsgField32] [varchar](24) NULL,
[MsgField33] [varchar](24) NULL,
[MsgField34] [varchar](24) NULL,
[MsgField35] [varchar](24) NULL,
[MsgField36] [varchar](24) NULL,
[MsgField37] [varchar](24) NULL,
[MsgField38] [varchar](24) NULL,
[MsgField39] [varchar](24) NULL,
[MsgField40] [varchar](24) NULL,
[MsgField41] [varchar](24) NULL,
[MsgField42] [varchar](24) NULL,
[MsgField43] [varchar](24) NULL,
[MsgField44] [varchar](24) NULL,
[MsgField45] [varchar](24) NULL,
[MsgField46] [varchar](24) NULL,
[MsgField47] [varchar](24) NULL,
[MsgField48] [varchar](24) NULL,
[MsgField49] [varchar](24) NULL,
[MsgField50] [varchar](24) NULL,
[MsgField51] [varchar](24) NULL,
[MsgField52] [varchar](24) NULL,
[MsgField53] [varchar](24) NULL,
[MsgField54] [varchar](24) NULL,
[MsgField55] [varchar](24) NULL,
[MsgField56] [varchar](24) NULL,
[MsgField57] [varchar](24) NULL,
[MsgField58] [varchar](24) NULL,
[MsgField59] [varchar](24) NULL,
[MsgField60] [varchar](24) NULL,
[MsgField61] [varchar](24) NULL,
[MsgField62] [varchar](24) NULL,
[MsgField63] [varchar](24) NULL,
[MsgField64] [varchar](24) NULL,
[MsgField65] [varchar](24) NULL,
[MsgField66] [varchar](24) NULL,
[MsgField67] [varchar](24) NULL,
[MsgField68] [varchar](24) NULL,
[MsgField69] [varchar](24) NULL,
[MsgField70] [varchar](24) NULL,
[MsgField71] [varchar](24) NULL,
[MsgField72] [varchar](24) NULL,
[MsgField73] [varchar](24) NULL,
[MsgField74] [varchar](24) NULL,
[MsgField75] [varchar](24) NULL,
[MsgField76] [varchar](24) NULL,
[MsgField77] [varchar](24) NULL,
[MsgField78] [varchar](24) NULL,
[MsgField79] [varchar](24) NULL,
[MsgField80] [varchar](24) NULL,
[MsgField81] [varchar](24) NULL,
[MsgField82] [varchar](24) NULL,
[MsgField83] [varchar](24) NULL,
[MsgField84] [varchar](24) NULL,
[MsgField85] [varchar](24) NULL,
[MsgField86] [varchar](24) NULL,
[MsgField87] [varchar](24) NULL,
[MsgField88] [varchar](24) NULL,
[MsgField89] [varchar](24) NULL,
[MsgField90] [varchar](24) NULL,
[MsgField91] [varchar](24) NULL,
[MsgField92] [varchar](24) NULL,
[MsgField93] [varchar](24) NULL,
[MsgField94] [varchar](24) NULL,
[MsgField95] [varchar](24) NULL,
[MsgField96] [varchar](24) NULL,
[MsgField97] [varchar](24) NULL,
[MsgField98] [varchar](24) NULL,
[MsgField99] [varchar](24) NULL,
[MsgField100] [varchar](24) NULL,
[MsgField101] [varchar](24) NULL,
[MsgField102] [varchar](24) NULL,
[MsgField103] [varchar](24) NULL,
[MsgField104] [varchar](24) NULL,
[MsgField105] [varchar](24) NULL,
[MsgField106] [varchar](24) NULL,
[MsgField107] [varchar](24) NULL,
[MsgField108] [varchar](24) NULL,
[MsgField109] [varchar](24) NULL,
[MsgField110] [varchar](24) NULL,
CONSTRAINT [PK_ParsedMessage] PRIMARY KEY CLUSTERED
(
[MSCLEUNIK] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[ParsedMessage] ADD CONSTRAINT [DF_ParsedMessage_ThisTableCreateDT] DEFAULT (getdate()) FOR [ThisTableCreateDT]
CREATE NONCLUSTERED INDEX [idx_Mach_MsgDate] ON [dbo].[ParsedMessage]
(
[MachineID] ASC,
[MessageDateTime] ASC
)
INCLUDE ( [SMI],
[EventGuid]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Attached are the 2 estimated exec plans for the 2 queries you submitted Qry1 and Qry2
September 28, 2011 at 11:27 am
Also note these table are in 2 different databases, I didn't indicate that in the statement I originally posted.
Here is execution results of the queries you sent, I modified them slightly to qualify the names and added machine ID in the using clause of the merge since it was referenced in the join.
All queries are being run over the same static data set. since the guids have all been updated previously the result is 0 rows updated, but the execution times are still very high. I'm thinking they way to do this now is to use a sub select and filter out the non-null event guids before the join.
UPDATE ADM.dbo.ParsedMessage
SET ADM.dbo.ParsedMessage.EventGuid = (SELECT EventGuid
FROM dbo.EventReference f
WHERE ADM.dbo.ParsedMessage.MachineID = f.MachineID
AND ADM.dbo.ParsedMessage.MessageDateTime BETWEEN f.EventBegin
AND f.EventEnd)
WHERE ADM.dbo.ParsedMessage.EventGuid IS NULL ;
Execution time over 12 minutes (canceled execution)
MERGE INTO ADM.dbo.ParsedMessage AS P
USING
(SELECT EventGuid,
EventBegin,
EventEnd
,MachineID
FROM ADMStage.dbo.EventReference) AS F
ON p.MessageDateTime BETWEEN f.EventBegin
AND f.EventEnd
AND p.MachineID = f.MachineID
WHEN MATCHED AND P.EventGUID IS NULL
THEN UPDATE
SET p.EventGUID = F.EventGUID ;
Execution time over 9 minutes (canceled, sorry not patient enough to wait)
Here is my original query using the sub-selects
UPDATE ss
SET ss.EventGuid = ss2.EventGuid
FROM (SELECT MachineID, MessageDateTime,EventGuid
FROM ADM.dbo.ParsedMessage
WHERE EventGuid IS NULL) ss
JOIN (SELECT MachineID,EventBegin,EventEnd,EventGuid
FROM ADMStage.dbo.EventReference
WHERE EventBegin >= (SELECT MIN(MessageDateTime) FROM ADM.dbo.ParsedMessage)
) ss2
ON ss.MachineID = ss2.MachineID
and ss.MessageDateTime BETWEEN ss2.EventBegin AND ss2.EventEnd
Execution time 1:27
Now under normal run conditions in production we are only going to be updating about 10,000 rows per run. I setup the test to do just 10k rows
My query does that in 43 seconds using the subselects which is still long. The merge is very long running and your other query is the same at 43 seconds.
I could live with that unless you had any other thoughts with the indexing perhaps...
September 29, 2011 at 6:37 am
No, the other query formats were just to see what the SQL engine is doing behind the scenes in building it's execution plans. No need to do anything more with those.
On the indexing, try moving the EventBegin and the MessageDateTime columns to the front of the indexes. Actually, create parallel indexes with those. See what that does to the execution plan and runtime.
CREATE NONCLUSTERED INDEX [idx_Times_Mach] ON [dbo].[EventReference]
(
[EventBegin] ASC,
[EventEnd] ASC,
[MachineID] ASC
)
INCLUDE ( [EventGuid],
[DateKey],
[RefNumber]) ;
CREATE NONCLUSTERED INDEX [idx_MsgDate_Mach] ON [dbo].[ParsedMessage]
(
[MessageDateTime] ASC,
[MachineID] ASC
)
INCLUDE ( [SMI],
[EventGuid]) ;
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 29, 2011 at 9:58 am
OK, so I added the 2 indexes you suggested and the performance was the same for run time.
But, I then deleted my original index from the parsed message table and execution time when from almost 4 seconds to 800ms and now the exec plan is suggesting a new index which I added then exec time when back to nearly 4 seconds.
All tests at this point are for updating 10,000 rows.
I'm very pleased with the results just confused why it would suggest an index that ultimately results in poorer performance.
Anyway I've attached the exec plans without the new indexes and with the new indexes after I deleted my existing one.
I can't say how much I appreciate your help with this.
this is the query I have been running:
UPDATE ss
SET ss.EventGuid = ss2.EventGuid
FROM (SELECT MachineID, MessageDateTime,EventGuid
FROM ADM.dbo.ParsedMessage
WHERE EventGuid IS NULL) ss
JOIN (SELECT MachineID,EventBegin,EventEnd,EventGuid
FROM ADMStage.dbo.EventReference
WHERE EventBegin >= (SELECT MIN(MessageDateTime) FROM ADM.dbo.ParsedMessage)
) ss2
ON ss.MachineID = ss2.MachineID
and ss.MessageDateTime BETWEEN ss2.EventBegin AND ss2.EventEnd
September 30, 2011 at 6:08 am
The automatic index suggestions always need to be taken with a grain of salt.
Glad I could help.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply