Merge statement with conditions

  • Hello,

    I am trying to merge a temp table (@TempTable) into a table(@MasterTable). I have 6 fields I need to compare if any of them have changed I need to update the table(@MasterTable).

    Fields:

    ID (I have a 1 to 1 database, so this ID will only exist once)

    Event_Name

    [EVENT]

    Client_Name

    Servicer_Code

    Servicer_Name

    MERGE INTO MasterTable as Target

    USING (SELECT ID, Event_Name, [EVENT], Client_Name, Servicer_Code, Servicer_Name FROM @tempTable) as Source

    ON (Target.ID = Source.ID)

    WHEN Matched AND Target.Event_Name <> Source.Event_name OR Target.[EVENT] <> Source.[EVENT]

    OR Target.Client_Name <> Source.Client_Name OR Target.Servicer_Code <> Source.Servicer_Code

    OR Target.Servicer_Name <> Source.Servicer_Name

    THEN

    UPDATE SET Target.Event_Name = Source.Event_Name and Target.[EVENT] = Source.[EVENT]

    AND Target.Client_Name = Source.Client_Name AND Target.Servicer_Code = Source.Servicer_Code

    AND Target.Servicer_Name = Source.Servicer_Name

    WHEN NOT MATCHED BY TARGET THEN

    INSERT (ID, Event_Name, [EVENT],Client_Name, Servicer_Code, Servicer_Name)

    values (Source.ID, Source.Event_Name, Source.[EVENT],Source.Client_Name, Source.Servicer_Code, Source.Servicer_Name)

    If the ID matches then I need it to compare the other 5 fields, if any of them have changed, update the record. If ID does not exist, insert.

    David92595

  • Why do you need MERGE?

    update T

    set Event_Name = S.Event_Name ,

    Client_Name = S.Client_Name,

    [EVENT] = S.[EVENT],

    Client_Name = S.Client_Name,

    Servicer_Code = S.Servicer_Code,

    Servicer_Name = S.Servicer_Name

    FROM MasterTable T

    INNER JOIN @tempTable S ON T.ID = S.ID

    WHERE Target.Event_Name <> Source.Event_name

    OR Target.[EVENT] <> Source.[EVENT]

    OR Target.Client_Name <> Source.Client_Name

    OR Target.Servicer_Code <> Source.Servicer_Code

    OR Target.Servicer_Name <> Source.Servicer_Name

    INSERT INTO MasterTable (Event_Name, [EVENT], Client_Name, Servicer_Code, Servicer_Name)

    SELECT Event_Name, [EVENT], Client_Name, Servicer_Code, Servicer_Name

    FROM @tempTable S

    WHERE NOT EXISTS (select * from MasterTable T

    WHERE T.ID = S.ID )

    _____________
    Code for TallyGenerator

  • Wow, way over thought that one...

    Thank you!

  • Sergiy (6/23/2016)


    Why do you need MERGE?

    Don't know, maybe because it is easier to read (once you've gotten used to the merge syntax), does the same thing in a single statement and performs better? 😉

    WITH cteSource as (

    SELECT ID, Event_Name, [EVENT], Client_Name, Servicer_Code, Servicer_Name

    FROM @tempTable

    ),

    cteTarget as (

    SELECT ID, Event_Name, [EVENT], Client_Name, Servicer_Code, Servicer_Name

    FROM MasterTable mt

    WHERE EXISTS (

    SELECT *

    FROM @temptable t

    WHERE t.ID = mt.ID

    )

    )

    MERGE INTO cteTarget t

    USING cteSource s

    ON (s.ID = t.ID)

    WHEN Matched AND (

    t.Event_Name <> s.Event_name

    OR t.[EVENT] <> s.[EVENT]

    OR t.Client_Name <> s.Client_Name

    OR t.Servicer_Code <> s.Servicer_Code

    OR t.Servicer_Name <> s.Servicer_Name

    )

    THEN

    UPDATE

    SET

    t.Event_Name = s.Event_Name,

    t.[EVENT] = s.[EVENT],

    t.Client_Name = s.Client_Name,

    t.Servicer_Code = s.Servicer_Code,

    t.Servicer_Name = s.Servicer_Name

    WHEN NOT MATCHED BY TARGET

    THEN

    INSERT (ID, Event_Name, [EVENT],Client_Name, Servicer_Code, Servicer_Name)

    VALUES (s.ID, s.Event_Name, s.[EVENT],s.Client_Name, s.Servicer_Code, s.Servicer_Name);



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • UPDATE is probably better than MERGE.

    If you are a fan of MERGE, please take a read of this[/url] and decide whether, perhaps, you should reconsider.

    And rather than using this type of construction:

    UPDATE

    ...

    WHERE

    a.1 <> b.1 AND

    a.2 <> b.2

    I would suggest using

    UPDATE

    WHERE NOT EXISTS (

    SELECT a.1, a.2

    INTERSECT

    SELECT b.1, b.2

    )

    because the latter version handles NULLs.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin (6/24/2016)


    UPDATE is probably better than MERGE.

    If you are a fan of MERGE, please take a read of this[/url] and decide whether, perhaps, you should reconsider.

    And rather than using this type of construction:

    UPDATE

    ...

    WHERE

    a.1 <> b.1 AND

    a.2 <> b.2

    I would suggest using

    UPDATE

    WHERE NOT EXISTS (

    SELECT a.1, a.2

    INTERSECT

    SELECT b.1, b.2

    )

    because the latter version handles NULLs.

    I prefer using EXCEPT, simply because it closer to the way I, and probably most other people, think about this.

    ;

    WITH Source AS (

    SELECT ID, Event_Name, [EVENT], Client_Name, Servicer_Code, Servicer_Name

    FROM @tempTable

    EXCEPT

    SELECT ID, Event_Name, [EVENT], Client_Name, Servicer_Code, Servicer_Name

    FROM MasterTable

    )

    <your merge or update statement here>

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Paul White suggests that the NOT EXISTS ... INTERSECT form produces better query plans than the EXCEPT form, and that's why I use it. See here (in the comments).

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Thanks guys

    As a work around I was comparing one field at a time. This knocked 200 lines of code out of my report and cut execution time in half!

  • Phil Parkin (6/24/2016)


    Paul White suggests that the NOT EXISTS ... INTERSECT form produces better query plans than the EXCEPT form, and that's why I use it. See here (in the comments).

    Further down in those same comments he says that you can use EXCEPT when you are not interested in both sides, which is the case here. We don't care about records that exist in the target, but not in the source in this particular case. The NOT EXISTS ... INTERSECT version will produce those records (depending on the initial join type), whereas the EXCEPT version will not.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Phil Parkin (6/24/2016)


    Paul White suggests that the NOT EXISTS ... INTERSECT form produces better query plans than the EXCEPT form, and that's why I use it. See here (in the comments).

    No, it does not.

    Before repeating after Paul White try to reproduce it yourself.

    And don't be afraid to deviate a bit from the scripts provided by Paul.

    The problem with Paul's suggestions is that he does not stop before skewing his test scripts to make the output fitting his agenda.

    In the blog post you quoted he's provided all sorts of "alternative to INTERSECT" forms of queries, except one.

    The one which is most obvious and is most commonly used for this kind of checks:

    SELECT

    *

    FROM @Set1 AS t

    JOIN @Set2 AS s ON

    t.pk = s.pk

    WHERE

    NOT EXISTS

    (

    SELECT 1

    WHERE

    t.pk = s.pk

    AND (t.ival = s.ival OR (t.ival IS NULL AND s.ival IS NULL))

    AND (t.cval = s.cval OR (t.cval IS NULL AND s.cval IS NULL))

    AND (t.mval = s.mval OR (t.mval IS NULL AND s.mval IS NULL))

    )

    Can you guess why?

    Can you suspect he does not (did not at the time) know about this form of query?

    Oh, no, you would not discredit him as an SQL guru. He certainly knows that.

    But with this query in the set the whole point of using INTERSECT would disappear: it produces exactly the same plan, exactly the same output and, of course, exactly the same execution statistics.

    So, NOT EXISTS ... INTERSECT is not any better than NOT EXISTS (... Equal... )

    Apart from probably the shorter code.

    But even the code would be of the same length if we'd set ANSI NULLS OFF:

    SET ANSI_NULLS OFF

    SELECT

    *

    FROM @Set1 AS t

    JOIN @Set2 AS s ON

    t.pk = s.pk

    WHERE

    NOT EXISTS

    (

    SELECT 1

    WHERE

    t.pk = s.pk

    AND (t.ival = s.ival)

    AND (t.cval = s.cval)

    AND (t.mval = s.mval)

    )

    Which we cannot do because this setting is being deprecated and to be strictly blocked in newer versions of SQL Server.

    So, the only thing which is different about INTERSECT that in an environment where ANSI NULLS standard is enforced it provides a "back door" for those developers who strongly believe that NULL=NULL.

    Well, some would consider it as an advantage, I would not do it.

    That's about "JOIN" kind of queries Paul was using in his test scripts.

    But it applies strictly to "update" queries.

    _____________
    Code for TallyGenerator

  • But what if we need to identify possibly new records added to the set?

    Then we need to use a bit different form of the queries:

    SET STATISTICS IO, TIME, PROFILE ON

    SELECT *

    FROM @Set1 AS t

    WHERE

    NOT EXISTS (SELECT s.* FROM @Set2 AS s INTERSECT SELECT t.*)

    SELECT *

    FROM @Set1 AS t

    WHERE

    NOT EXISTS

    (

    SELECT 1

    FROM @Set2 s

    WHERE

    t.pk = s.pk

    AND (t.ival = s.ival OR (t.ival IS NULL AND s.ival IS NULL))

    AND (t.cval = s.cval OR (t.cval IS NULL AND s.cval IS NULL))

    AND (t.mval = s.mval OR (t.mval IS NULL AND s.mval IS NULL))

    )

    SELECT *

    FROM @Set1 AS t

    EXCEPT

    SELECT *

    FROM @Set2 s

    SET STATISTICS IO, TIME, PROFILE OFF

    If you run it you'll see:

    - all 3 queries generate exactly the same output;

    - all 3 queries result in exactly the same IO stats;

    - time stats are different from time to time, and there is no consistent winner or loser amongst these 3 queries;

    - execution plans are almost identical for all 3 queries, except INTERSECT query has an extra TOP 1 operation in it. It may be still below 1% overhead, but anyway - it's there. An overhead.

    RowsExecutesStmtTextStmtIdNodeIdParentPhysicalOpLogicalOpArgumentDefinedValuesEstimateRowsEstimateIOEstimateCPUAvgRowSizeTotalSubtreeCostOutputListWarningsTypeParallelEstimateExecutions

    41SELECT *

    FROM @Set1 AS t

    WHERE

    NOT EXISTS (SELECT s.* FROM @Set2 AS s INTERSECT SELECT t.*)1110NULLNULLNULLNULL1NULLNULLNULL0.00657166NULLNULLSELECT0NULL

    41 |--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([t].[pk], [t].[ival], [t].[cval], [t].[mval]))1121Nested LoopsLeft Anti Semi JoinOUTER REFERENCES:([t].[pk], [t].[ival], [t].[cval], [t].[mval])NULL104.18E-06280.00657166[t].[pk], [t].[ival], [t].[cval], [t].[mval]NULLPLAN_ROW01

    61 |--Index Scan(OBJECT:(@Set1 AS [t]))1132Index ScanIndex ScanOBJECT:(@Set1 AS [t])[t].[pk], [t].[ival], [t].[cval], [t].[mval]10.0031250.0001581280.0032831[t].[pk], [t].[ival], [t].[cval], [t].[mval]NULLPLAN_ROW01

    26 |--Top(TOP EXPRESSION:((1)))1142TopTopTOP EXPRESSION:((1))NULL101E-0790.00328438NULLNULLPLAN_ROW01

    26 |--Clustered Index Seek(OBJECT:(@Set2 AS ), SEEK:(.[pk]=@Set1.[pk] as [t].[pk]), WHERE:(@Set2.[ival] as .[ival] = @Set1.[ival] as [t].[ival] AND @Set2.[mval] as .[mval] = @Set1.[mval] as [t].[mval] AND @Set2.[cval] as .[cval] = @Set1.[cval] as [t].[cval]) ORDERED FORWARD)1154Clustered Index SeekClustered Index SeekOBJECT:(@Set2 AS ), SEEK:(.[pk]=@Set1.[pk] as [t].[pk]), WHERE:(@Set2.[ival] as .[ival] = @Set1.[ival] as [t].[ival] AND @Set2.[mval] as .[mval] = @Set1.[mval] as [t].[mval] AND @Set2.[cval] as .[cval] = @Set1.[cval] as [t].[cval]) ORDERED FORWARDNULL10.0031250.0001581200.0032831NULLNULLPLAN_ROW01

    RowsExecutesStmtTextStmtIdNodeIdParentPhysicalOpLogicalOpArgumentDefinedValuesEstimateRowsEstimateIOEstimateCPUAvgRowSizeTotalSubtreeCostOutputListWarningsTypeParallelEstimateExecutions

    41SELECT *

    FROM @Set1 AS t

    WHERE

    NOT EXISTS

    (

    SELECT 1

    FROM @Set2 s

    WHERE

    t.pk = s.pk

    AND (t.ival = s.ival OR (t.ival IS NULL AND s.ival IS NULL))

    AND (t.cval = s.cval OR (t.cval IS NULL AND s.cval IS NULL))

    AND (t.mval = s.mval OR (t.mval IS NULL AND s.mval IS NULL))

    )1310NULLNULLNULLNULL1NULLNULLNULL0.00657156NULLNULLSELECT0NULL

    41 |--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([t].[pk], [t].[ival], [t].[cval], [t].[mval]))1321Nested LoopsLeft Anti Semi JoinOUTER REFERENCES:([t].[pk], [t].[ival], [t].[cval], [t].[mval])NULL104.18E-06280.00657156[t].[pk], [t].[ival], [t].[cval], [t].[mval]NULLPLAN_ROW01

    61 |--Index Scan(OBJECT:(@Set1 AS [t]))1332Index ScanIndex ScanOBJECT:(@Set1 AS [t])[t].[pk], [t].[ival], [t].[cval], [t].[mval]10.0031250.0001581280.0032831[t].[pk], [t].[ival], [t].[cval], [t].[mval]NULLPLAN_ROW01

    26 |--Clustered Index Seek(OBJECT:(@Set2 AS ), SEEK:(.[pk]=@Set1.[pk] as [t].[pk]), WHERE:(@Set1.[ival] as [t].[ival] = @Set2.[ival] as .[ival] AND @Set1.[mval] as [t].[mval] = @Set2.[mval] as .[mval] AND @Set1.[cval] as [t].[cval] = @Set2.[cval] as .[cval]) ORDERED FORWARD)1342Clustered Index SeekClustered Index SeekOBJECT:(@Set2 AS ), SEEK:(.[pk]=@Set1.[pk] as [t].[pk]), WHERE:(@Set1.[ival] as [t].[ival] = @Set2.[ival] as .[ival] AND @Set1.[mval] as [t].[mval] = @Set2.[mval] as .[mval] AND @Set1.[cval] as [t].[cval] = @Set2.[cval] as .[cval]) ORDERED FORWARDNULL10.0031250.0001581200.0032831NULLNULLPLAN_ROW01

    RowsExecutesStmtTextStmtIdNodeIdParentPhysicalOpLogicalOpArgumentDefinedValuesEstimateRowsEstimateIOEstimateCPUAvgRowSizeTotalSubtreeCostOutputListWarningsTypeParallelEstimateExecutions

    41SELECT *

    FROM @Set1 AS t

    EXCEPT

    SELECT *

    FROM @Set2 s1510NULLNULLNULLNULL1NULLNULLNULL0.00657156NULLNULLSELECT0NULL

    41 |--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([t].[pk], [t].[ival], [t].[cval], [t].[mval]))1521Nested LoopsLeft Anti Semi JoinOUTER REFERENCES:([t].[pk], [t].[ival], [t].[cval], [t].[mval])NULL104.18E-06280.00657156[t].[pk], [t].[ival], [t].[cval], [t].[mval]NULLPLAN_ROW01

    61 |--Index Scan(OBJECT:(@Set1 AS [t]))1532Index ScanIndex ScanOBJECT:(@Set1 AS [t])[t].[pk], [t].[ival], [t].[cval], [t].[mval]10.0031250.0001581280.0032831[t].[pk], [t].[ival], [t].[cval], [t].[mval]NULLPLAN_ROW01

    26 |--Clustered Index Seek(OBJECT:(@Set2 AS ), SEEK:(.[pk]=@Set1.[pk] as [t].[pk]), WHERE:(@Set1.[ival] as [t].[ival] = @Set2.[ival] as .[ival] AND @Set1.[mval] as [t].[mval] = @Set2.[mval] as .[mval] AND @Set1.[cval] as [t].[cval] = @Set2.[cval] as .[cval]) ORDERED FORWARD)1542Clustered Index SeekClustered Index SeekOBJECT:(@Set2 AS ), SEEK:(.[pk]=@Set1.[pk] as [t].[pk]), WHERE:(@Set1.[ival] as [t].[ival] = @Set2.[ival] as .[ival] AND @Set1.[mval] as [t].[mval] = @Set2.[mval] as .[mval] AND @Set1.[cval] as [t].[cval] = @Set2.[cval] as .[cval]) ORDERED FORWARDNULL10.0031250.0001581200.0032831NULLNULLPLAN_ROW01

    I also tested with several extra rows added to table @Set1:

    INSERT @Set1

    (pk, ival, cval, mval)

    VALUES

    (1, 1000, 'a', $1),

    (2, NULL, 'b', $2),

    (3, 3000, 'c', NULL),

    (4, NULL, 'd', $4),

    (5, 5000, 'e', $5),

    (6, 6000, 'e', $6),

    (7, 7000, 'e', $7),

    (8, 8000, 'e', $8),

    (9, 9000, 'e', $9),

    (10, 10000, 'e', $10),

    (11, 11000, 'e', $11),

    (12, 12000, 'e', $12);

    ;

    Conclusion remains the same:

    1. NOT EXISTS ... INTERSECT does not provide any advantage comparing to NOT EXISTS ...EQUAL or EXCEPT forms of queries;

    2. It adds a small overhead of an additional TOP 1 operation when looking for newly added rows.

    Which is a bit different from what Paul has come up with.

    You may with to run those tests for yourself.

    See which conclusions you'll support afterwards.

    _____________
    Code for TallyGenerator

  • Sergiy, thank you for taking the time to write these informative posts.

    I admit that I am guilty of following Paul White's recommendations as would a sheep and now I shall be looking at things more closely, next time I have to develop code that does this sort of thing.

    I still prefer the EXCEPT or INTERSECT versions, because of their inherent handling of NULLs & agree that the EXCEPT form is possibly more pleasing on the brain than the NOT EXISTS ... INTERSECT form.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin (6/28/2016)


    I still prefer the EXCEPT or INTERSECT versions, because of their inherent handling of NULLs & agree that the EXCEPT form is possibly more pleasing on the brain than the NOT EXISTS ... INTERSECT form.

    That's your personal preferences, and you have every right to have it and follow it (unless you're in North Korea :-))

    Functionally all 3 methods are the same, so it's other factors which play role in choosing one method over another.

    Like with CTE - I prefer to use derived tables, some prefer CTE's, and neither side has a case to prove which method is better.

    _____________
    Code for TallyGenerator

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply