Compare of variables and handling nulls

  • allan.ford17 - Thursday, October 18, 2018 6:11 PM

    I wanted a process that does absolute minimal DB transactions ... i.e. there may be a table row level trigger than logs transactions for other integration purposes. There may also be child records in child and grandchildren tables ...

    Then this should work for you.

    1. Delete obsolete records:
    DELETE T2
    FROM dbo.Table_2 T2
    WHERE NOT EXISTS (select * from dbo.Table_1 T1
            where T2.ID_COL = T1.ID_COL)

    2. Update records with ID's present in both tables  and having some differences:
    UPDATE T2
    SET NAME_VAL1 = T2.NAME_VAL1,
        DATE_VAL1 = T2.DATE_VAL1,
        NUMBER_VAL1 = T2.NUMBER_VAL1
    FROM dbo.Table_2 T2
    INNER JOIN dbo.Table_1 T1 ON T2.ID_COL = T1.ID_COL
    WHERE not (
            (T1.NAME_VAL1 = T2.NAME_VAL1 or (T1.NAME_VAL1 is null and T2.NAME_VAL1 is null ))
            and (T1.DATE_VAL1 = T2.DATE_VAL1 or (T1.DATE_VAL1 is null and T2.DATE_VAL1 is null ))
            and (T1.NUMBER_VAL1 = T2.NUMBER_VAL1 or (T1.NUMBER_VAL1 is null and T2.NUMBER_VAL1 is null ))
            )

    This script will cause clustered index scan, so it's not too effective.
    But it cannot be improved with existing indexing. 
    With additional indexing in place some more careful strategies may be chosen. 
    But again - it pays only on big data sets.

    3. Add new records:
    Insert dbo.Table_2
        (ID_COL,NAME_VAL1,DATE_VAL1,NUMBER_VAL1)
    SELECT ID_COL,NAME_VAL1,DATE_VAL1,NUMBER_VAL1
    FROM dbo.Table_1 T1
    WHERE NOT EXISTS (select * from dbo.Table_2 T2
            where T2.ID_COL = T1.ID_COL)

    When I measured performance with SET STATISTICS TIME ON the speed difference was > 100 times.
    But it's not an accurate measure, because statistics collected on every statement in every loop take time for themselves.
    And on such a small data set my 3 statements take < 1 mcs to complete

    You must have bigger sets - try to execute it like this:
    declare @TimeStamp datetime2

    select * from dbo.Table_1;
    select * from dbo.Table_2;

    select @TimeStamp = SYSDATETIME()

    DECLARE  @return_value int

    EXEC  @return_value = [dbo].[TablesSyncByCompare]

    select [Execution time - mcs] = datediff(mcs, @TimeStamp, sysdatetime() )
    SELECT  'Return Value' = @return_value

    and compare to this:

    declare @TimeStamp datetime2

    select * from dbo.Table_1;
    select * from dbo.Table_2;

    select @TimeStamp = SYSDATETIME()

    DELETE T2
    ...

    UPDATE T2
    ....
    Insert dbo.Table_2
    ....

    select [Execution time - mcs] = datediff(mcs, @TimeStamp, sysdatetime() ) 

    _____________
    Code for TallyGenerator

  • Thanks Sergiy  !!

    I will give this a try too !
    Good approach ..  Could adapt for multi column primary key also .. (e.g. concat the key values into a single value perhaps ..)
    One reason for wanting to process RBAR (Row by agonising row) might be to handle / capture errors that occur.
    e.g. bad data ... data that doesn't meet foreign key constraints or other constraints ..
    May want to capture the bad data but process as much good data as available.
    In some cases it is OK to assume all the data is good and go with a process that does not have error handling ..

    cheers, thanks,
    Allan.

  • allan.ford17 - Friday, October 19, 2018 1:22 AM

    Thanks Sergiy  !!

    One reason for wanting to process RBAR (Row by agonising row) might be to handle / capture errors that occur.
    e.g. bad data ... data that doesn't meet foreign key constraints or other constraints ..
    May want to capture the bad data but process as much good data as available.
    In some cases it is OK to assume all the data is good and go with a process that does not have error handling ..

    Programming by exception is not the greatest idea for database environments.
    There is no user to action on an exception, so you must pre-program all the actions in all possible scenarios.
    And if you know all the error scenarios and all corresponding actions - there is no need to wait for exceptions.
    Check for bad data before uploading and exclude it from the set (reporting all the violation reasons on the way).
    OUTPUT clause makes the task really easy.

    DELETE  T2
    OUTPUT {list of columns}, 'Violation of FK to [Look-up Table]' Reason
        INTO [Audit Table]
    FROM dbo.Table_2 T2
    WHERE NOT EXISTS (select * from [Look-up table] LT
    where Lt.PK_Col = T2.FK_Col
    )

    Proceed with upload only after the source data have been cleaned up.

    _____________
    Code for TallyGenerator

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply