Compare of variables and handling nulls

Question

Compare of variables and handling nulls

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110208 More actions · Answer 1

allan.ford17 - Thursday, October 18, 2018 6:11 PM
I wanted a process that does absolute minimal DB transactions ... i.e. there may be a table row level trigger than logs transactions for other integration purposes. There may also be child records in child and grandchildren tables ...

Then this should work for you.

1. Delete obsolete records:
DELETE T2 FROM dbo.Table_2 T2 WHERE NOT EXISTS (select * from dbo.Table_1 T1 where T2.ID_COL = T1.ID_COL)

2. Update records with ID's present in both tables and having some differences:
UPDATE T2 SET NAME_VAL1 = T2.NAME_VAL1, DATE_VAL1 = T2.DATE_VAL1, NUMBER_VAL1 = T2.NUMBER_VAL1 FROM dbo.Table_2 T2 INNER JOIN dbo.Table_1 T1 ON T2.ID_COL = T1.ID_COL WHERE not ( (T1.NAME_VAL1 = T2.NAME_VAL1 or (T1.NAME_VAL1 is null and T2.NAME_VAL1 is null )) and (T1.DATE_VAL1 = T2.DATE_VAL1 or (T1.DATE_VAL1 is null and T2.DATE_VAL1 is null )) and (T1.NUMBER_VAL1 = T2.NUMBER_VAL1 or (T1.NUMBER_VAL1 is null and T2.NUMBER_VAL1 is null )) )
This script will cause clustered index scan, so it's not too effective.
But it cannot be improved with existing indexing.
With additional indexing in place some more careful strategies may be chosen.
But again - it pays only on big data sets.

3. Add new records:
Insert dbo.Table_2 (ID_COL,NAME_VAL1,DATE_VAL1,NUMBER_VAL1) SELECT ID_COL,NAME_VAL1,DATE_VAL1,NUMBER_VAL1 FROM dbo.Table_1 T1 WHERE NOT EXISTS (select * from dbo.Table_2 T2 where T2.ID_COL = T1.ID_COL)

When I measured performance with SET STATISTICS TIME ON the speed difference was > 100 times.
But it's not an accurate measure, because statistics collected on every statement in every loop take time for themselves.
And on such a small data set my 3 statements take < 1 mcs to complete

You must have bigger sets - try to execute it like this:
declare @TimeStamp datetime2

select * from dbo.Table_1; select * from dbo.Table_2;

select @TimeStamp = SYSDATETIME()

DECLARE @return_value int

EXEC @return_value = [dbo].[TablesSyncByCompare]

select [Execution time - mcs] = datediff(mcs, @TimeStamp, sysdatetime() ) SELECT 'Return Value' = @return_value
and compare to this:

declare @TimeStamp datetime2

select * from dbo.Table_1; select * from dbo.Table_2;

select @TimeStamp = SYSDATETIME()

DELETE T2 ...

UPDATE T2 .... Insert dbo.Table_2 ....

select [Execution time - mcs] = datediff(mcs, @TimeStamp, sysdatetime() )

_____________
Code for TallyGenerator

allan.ford17 SSC Eights! Points: 944 More actions · Answer 2

Thanks Sergiy !!

I will give this a try too !
Good approach .. Could adapt for multi column primary key also .. (e.g. concat the key values into a single value perhaps ..)
One reason for wanting to process RBAR (Row by agonising row) might be to handle / capture errors that occur.
e.g. bad data ... data that doesn't meet foreign key constraints or other constraints ..
May want to capture the bad data but process as much good data as available.
In some cases it is OK to assume all the data is good and go with a process that does not have error handling ..

cheers, thanks,
Allan.

Sergiy SSC Guru Points: 110208 More actions · Answer 3

allan.ford17 - Friday, October 19, 2018 1:22 AM
Thanks Sergiy !!
One reason for wanting to process RBAR (Row by agonising row) might be to handle / capture errors that occur.
e.g. bad data ... data that doesn't meet foreign key constraints or other constraints ..
May want to capture the bad data but process as much good data as available.
In some cases it is OK to assume all the data is good and go with a process that does not have error handling ..

Programming by exception is not the greatest idea for database environments.
There is no user to action on an exception, so you must pre-program all the actions in all possible scenarios.
And if you know all the error scenarios and all corresponding actions - there is no need to wait for exceptions.
Check for bad data before uploading and exclude it from the set (reporting all the violation reasons on the way).
OUTPUT clause makes the task really easy.

DELETE T2 OUTPUT {list of columns}, 'Violation of FK to [Look-up Table]' Reason INTO [Audit Table] FROM dbo.Table_2 T2 WHERE NOT EXISTS (select * from [Look-up table] LT where Lt.PK_Col = T2.FK_Col )

Proceed with upload only after the source data have been cleaned up.

_____________
Code for TallyGenerator