Joining tables based using a date to a date range

  • We have machines that continuously send messages which are stored in a table. The machine also has different stages of operation (events) which are tracked in a separate sql table with a machine ID, EventGuid, begin and end times.

    the problem is that the message does not include the event (EventGuid)

    I need to combine these 2 tables so I can link the message to the event on that machine.

    My current query uses a between clause in the join condition to join a date to a date range.

    Looking for some suggestion to help speed things up current run time on this query is nearly 3 minutes.

    both tables have around 700,000 rows (roughly only a 1/5 of the events will link up with messages), the where clause checks for the minimum message date to help filter out event rows that will not have a match.

    82% of the execution is shared between 2 Hash Matches

    The query is fully covered by the indexes.

    here is the statement I have now. We are using SSIS as well so if there is a solution that uses a data flow I'm willing to go that route as well.

    UPDATE p

    SET p.EventGuid = f.EventGuid

    FROM dbo.ParsedMessage p

    JOIN dbo.EventReference f

    ON p.MachineID = f.MachineID

    and p.MessageDateTime BETWEEN f.EventBegin AND f.EventEnd

    WHERE p.EventGuid IS NULL

    AND f.EventBegin >= (SELECT MIN(MessageDateTime) FROM dbo.ParsedMessage)

  • I seriously doubt this is doing anything useful in your query, since the Join math already enforces that rule.

    AND f.EventBegin >= (SELECT MIN(MessageDateTime) FROM dbo.ParsedMessage)

    Can you post the execution plan?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Ok, I've attached 2 plans, the original (plan1) and a second with the where condition removed (plan2)

  • Try these two versions. (Dev/Test environment, of course.) What kind of performance and execution plans do you get with them?

    UPDATE dbo.ParsedMessage

    SET p.EventGuid = (SELECT EventGuid

    FROM dbo.EventReference f

    WHERE p.MachineID = f.MachineID

    AND p.MessageDateTime BETWEEN f.EventBegin

    AND f.EventEnd)

    WHERE p.EventGuid IS NULL ;

    MERGE INTO dbo.ParsedMessage AS P

    USING

    (SELECT EventGuid,

    EventBegin,

    EventEnd

    FROM dbo.EventReference) AS F

    ON p.MessageDateTime BETWEEN f.EventBegin

    AND f.EventEnd

    AND p.MachineID = f.MachineID

    WHEN MATCHED AND P.EventGUID IS NULL

    THEN UPDATE

    SET EventGUID = F.EventGUID ;

    I'd also like to see table and index definitions. I suspect we could get improved efficiency from some changes to the index structure.

    I'm assuming from the forum that you are on SQL 2008. Is that correct? If so, which Edition?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared thanks for helping with this.

    We are on 2008 R2 Ent.

    here are the table and index defs

    CREATE TABLE ADMStage.[dbo].[EventReference] (

    [EventGuid] [uniqueidentifier] NOT NULL

    ,[DateKey] [int] NOT NULL

    ,[RefNumber] [varchar](4) NOT NULL

    ,[Sequence] [int] NOT NULL

    ,[EventBegin] [datetime] NOT NULL

    ,[EventEnd] [datetime] NOT NULL

    ,[MachineID] [varchar](6) NULL

    ,CONSTRAINT [PK_FlightReference] PRIMARY KEY CLUSTERED ([EventGuid] ASC)

    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,

    IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,

    ALLOW_PAGE_LOCKS = ON) ON [PRIMARY])

    ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    ALTER TABLE [dbo].[EventReference] ADD CONSTRAINT [DF_FlightReference_SegmentGuid] DEFAULT (NEWID()) FOR [EventGuid]

    CREATE NONCLUSTERED INDEX [idx_Mach_Times] ON [dbo].[EventReference]

    (

    [MachineID] ASC,

    [EventBegin] ASC,

    [EventEnd] ASC

    )

    INCLUDE ( [EventGuid],

    [DateKey],

    [RefNumber]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    CREATE TABLE ADM.[dbo].[ParsedMessage](

    [MSCLEUNIK] [int] NOT NULL,

    [SMI] [varchar](3) NULL,

    [SubSMI] [varchar](8) NULL,

    [SubSMI2] [varchar](8) NULL,

    [Version] [varchar](8) NULL,

    [MessageDateTime] [datetime] NULL,

    [MachineID] [varchar](6) NULL,

    [RefNumber] [varchar](4) NULL,

    [RefCo] [varchar](4) NULL,

    [ThisTableCreateDT] [datetime] NOT NULL,

    [EventGuid] [uniqueidentifier] NULL,

    [MsgField1] [varchar](24) NULL,

    [MsgField2] [varchar](24) NULL,

    [MsgField3] [varchar](24) NULL,

    [MsgField4] [varchar](24) NULL,

    [MsgField5] [varchar](24) NULL,

    [MsgField6] [varchar](24) NULL,

    [MsgField7] [varchar](24) NULL,

    [MsgField8] [varchar](24) NULL,

    [MsgField9] [varchar](24) NULL,

    [MsgField10] [varchar](24) NULL,

    [MsgField11] [varchar](24) NULL,

    [MsgField12] [varchar](24) NULL,

    [MsgField13] [varchar](24) NULL,

    [MsgField14] [varchar](24) NULL,

    [MsgField15] [varchar](24) NULL,

    [MsgField16] [varchar](24) NULL,

    [MsgField17] [varchar](24) NULL,

    [MsgField18] [varchar](24) NULL,

    [MsgField19] [varchar](24) NULL,

    [MsgField20] [varchar](24) NULL,

    [MsgField21] [varchar](24) NULL,

    [MsgField22] [varchar](24) NULL,

    [MsgField23] [varchar](24) NULL,

    [MsgField24] [varchar](24) NULL,

    [MsgField25] [varchar](24) NULL,

    [MsgField26] [varchar](24) NULL,

    [MsgField27] [varchar](24) NULL,

    [MsgField28] [varchar](24) NULL,

    [MsgField29] [varchar](24) NULL,

    [MsgField30] [varchar](24) NULL,

    [MsgField31] [varchar](24) NULL,

    [MsgField32] [varchar](24) NULL,

    [MsgField33] [varchar](24) NULL,

    [MsgField34] [varchar](24) NULL,

    [MsgField35] [varchar](24) NULL,

    [MsgField36] [varchar](24) NULL,

    [MsgField37] [varchar](24) NULL,

    [MsgField38] [varchar](24) NULL,

    [MsgField39] [varchar](24) NULL,

    [MsgField40] [varchar](24) NULL,

    [MsgField41] [varchar](24) NULL,

    [MsgField42] [varchar](24) NULL,

    [MsgField43] [varchar](24) NULL,

    [MsgField44] [varchar](24) NULL,

    [MsgField45] [varchar](24) NULL,

    [MsgField46] [varchar](24) NULL,

    [MsgField47] [varchar](24) NULL,

    [MsgField48] [varchar](24) NULL,

    [MsgField49] [varchar](24) NULL,

    [MsgField50] [varchar](24) NULL,

    [MsgField51] [varchar](24) NULL,

    [MsgField52] [varchar](24) NULL,

    [MsgField53] [varchar](24) NULL,

    [MsgField54] [varchar](24) NULL,

    [MsgField55] [varchar](24) NULL,

    [MsgField56] [varchar](24) NULL,

    [MsgField57] [varchar](24) NULL,

    [MsgField58] [varchar](24) NULL,

    [MsgField59] [varchar](24) NULL,

    [MsgField60] [varchar](24) NULL,

    [MsgField61] [varchar](24) NULL,

    [MsgField62] [varchar](24) NULL,

    [MsgField63] [varchar](24) NULL,

    [MsgField64] [varchar](24) NULL,

    [MsgField65] [varchar](24) NULL,

    [MsgField66] [varchar](24) NULL,

    [MsgField67] [varchar](24) NULL,

    [MsgField68] [varchar](24) NULL,

    [MsgField69] [varchar](24) NULL,

    [MsgField70] [varchar](24) NULL,

    [MsgField71] [varchar](24) NULL,

    [MsgField72] [varchar](24) NULL,

    [MsgField73] [varchar](24) NULL,

    [MsgField74] [varchar](24) NULL,

    [MsgField75] [varchar](24) NULL,

    [MsgField76] [varchar](24) NULL,

    [MsgField77] [varchar](24) NULL,

    [MsgField78] [varchar](24) NULL,

    [MsgField79] [varchar](24) NULL,

    [MsgField80] [varchar](24) NULL,

    [MsgField81] [varchar](24) NULL,

    [MsgField82] [varchar](24) NULL,

    [MsgField83] [varchar](24) NULL,

    [MsgField84] [varchar](24) NULL,

    [MsgField85] [varchar](24) NULL,

    [MsgField86] [varchar](24) NULL,

    [MsgField87] [varchar](24) NULL,

    [MsgField88] [varchar](24) NULL,

    [MsgField89] [varchar](24) NULL,

    [MsgField90] [varchar](24) NULL,

    [MsgField91] [varchar](24) NULL,

    [MsgField92] [varchar](24) NULL,

    [MsgField93] [varchar](24) NULL,

    [MsgField94] [varchar](24) NULL,

    [MsgField95] [varchar](24) NULL,

    [MsgField96] [varchar](24) NULL,

    [MsgField97] [varchar](24) NULL,

    [MsgField98] [varchar](24) NULL,

    [MsgField99] [varchar](24) NULL,

    [MsgField100] [varchar](24) NULL,

    [MsgField101] [varchar](24) NULL,

    [MsgField102] [varchar](24) NULL,

    [MsgField103] [varchar](24) NULL,

    [MsgField104] [varchar](24) NULL,

    [MsgField105] [varchar](24) NULL,

    [MsgField106] [varchar](24) NULL,

    [MsgField107] [varchar](24) NULL,

    [MsgField108] [varchar](24) NULL,

    [MsgField109] [varchar](24) NULL,

    [MsgField110] [varchar](24) NULL,

    CONSTRAINT [PK_ParsedMessage] PRIMARY KEY CLUSTERED

    (

    [MSCLEUNIK] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[ParsedMessage] ADD CONSTRAINT [DF_ParsedMessage_ThisTableCreateDT] DEFAULT (getdate()) FOR [ThisTableCreateDT]

    CREATE NONCLUSTERED INDEX [idx_Mach_MsgDate] ON [dbo].[ParsedMessage]

    (

    [MachineID] ASC,

    [MessageDateTime] ASC

    )

    INCLUDE ( [SMI],

    [EventGuid]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    Attached are the 2 estimated exec plans for the 2 queries you submitted Qry1 and Qry2

  • Also note these table are in 2 different databases, I didn't indicate that in the statement I originally posted.

    Here is execution results of the queries you sent, I modified them slightly to qualify the names and added machine ID in the using clause of the merge since it was referenced in the join.

    All queries are being run over the same static data set. since the guids have all been updated previously the result is 0 rows updated, but the execution times are still very high. I'm thinking they way to do this now is to use a sub select and filter out the non-null event guids before the join.

    UPDATE ADM.dbo.ParsedMessage

    SET ADM.dbo.ParsedMessage.EventGuid = (SELECT EventGuid

    FROM dbo.EventReference f

    WHERE ADM.dbo.ParsedMessage.MachineID = f.MachineID

    AND ADM.dbo.ParsedMessage.MessageDateTime BETWEEN f.EventBegin

    AND f.EventEnd)

    WHERE ADM.dbo.ParsedMessage.EventGuid IS NULL ;

    Execution time over 12 minutes (canceled execution)

    MERGE INTO ADM.dbo.ParsedMessage AS P

    USING

    (SELECT EventGuid,

    EventBegin,

    EventEnd

    ,MachineID

    FROM ADMStage.dbo.EventReference) AS F

    ON p.MessageDateTime BETWEEN f.EventBegin

    AND f.EventEnd

    AND p.MachineID = f.MachineID

    WHEN MATCHED AND P.EventGUID IS NULL

    THEN UPDATE

    SET p.EventGUID = F.EventGUID ;

    Execution time over 9 minutes (canceled, sorry not patient enough to wait)

    Here is my original query using the sub-selects

    UPDATE ss

    SET ss.EventGuid = ss2.EventGuid

    FROM (SELECT MachineID, MessageDateTime,EventGuid

    FROM ADM.dbo.ParsedMessage

    WHERE EventGuid IS NULL) ss

    JOIN (SELECT MachineID,EventBegin,EventEnd,EventGuid

    FROM ADMStage.dbo.EventReference

    WHERE EventBegin >= (SELECT MIN(MessageDateTime) FROM ADM.dbo.ParsedMessage)

    ) ss2

    ON ss.MachineID = ss2.MachineID

    and ss.MessageDateTime BETWEEN ss2.EventBegin AND ss2.EventEnd

    Execution time 1:27

    Now under normal run conditions in production we are only going to be updating about 10,000 rows per run. I setup the test to do just 10k rows

    My query does that in 43 seconds using the subselects which is still long. The merge is very long running and your other query is the same at 43 seconds.

    I could live with that unless you had any other thoughts with the indexing perhaps...

  • No, the other query formats were just to see what the SQL engine is doing behind the scenes in building it's execution plans. No need to do anything more with those.

    On the indexing, try moving the EventBegin and the MessageDateTime columns to the front of the indexes. Actually, create parallel indexes with those. See what that does to the execution plan and runtime.

    CREATE NONCLUSTERED INDEX [idx_Times_Mach] ON [dbo].[EventReference]

    (

    [EventBegin] ASC,

    [EventEnd] ASC,

    [MachineID] ASC

    )

    INCLUDE ( [EventGuid],

    [DateKey],

    [RefNumber]) ;

    CREATE NONCLUSTERED INDEX [idx_MsgDate_Mach] ON [dbo].[ParsedMessage]

    (

    [MessageDateTime] ASC,

    [MachineID] ASC

    )

    INCLUDE ( [SMI],

    [EventGuid]) ;

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • OK, so I added the 2 indexes you suggested and the performance was the same for run time.

    But, I then deleted my original index from the parsed message table and execution time when from almost 4 seconds to 800ms and now the exec plan is suggesting a new index which I added then exec time when back to nearly 4 seconds.

    All tests at this point are for updating 10,000 rows.

    I'm very pleased with the results just confused why it would suggest an index that ultimately results in poorer performance.

    Anyway I've attached the exec plans without the new indexes and with the new indexes after I deleted my existing one.

    I can't say how much I appreciate your help with this.

    this is the query I have been running:

    UPDATE ss

    SET ss.EventGuid = ss2.EventGuid

    FROM (SELECT MachineID, MessageDateTime,EventGuid

    FROM ADM.dbo.ParsedMessage

    WHERE EventGuid IS NULL) ss

    JOIN (SELECT MachineID,EventBegin,EventEnd,EventGuid

    FROM ADMStage.dbo.EventReference

    WHERE EventBegin >= (SELECT MIN(MessageDateTime) FROM ADM.dbo.ParsedMessage)

    ) ss2

    ON ss.MachineID = ss2.MachineID

    and ss.MessageDateTime BETWEEN ss2.EventBegin AND ss2.EventEnd

  • The automatic index suggestions always need to be taken with a grain of salt.

    Glad I could help.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply