columns unequal? (column might be null)

Question

Post reply

columns unequal? (column might be null)

Andreas Steidle

SSC Journeyman

Points: 85
More actions
December 28, 2004 at 12:40 pm

#60583

I am try to find unequal rows of two tables t1 and t2 on two different servers.
To find where two columns are unequal I am currently using
t1.Name<>t2.Name
or t1.Name is null and t2.Name is not null
or t1.Name is not null and t2.Name is null
If the columns may be null, does ti really need five expression in standard SQL to do this? Am I to stupid? Is there a special SQL-server function to do this? If a table has some 30 columns this will produce a lot of code...
I know about SQL-Servers SET ANSI_NULLS OFF so we could write
t1.Name<>t2.Name
but this reduces the cases where this statement can be used. However, would it run faster if I do it this way?
Thanks for any hints
Andreas

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Osoba SSCommitted Points: 1964 More actions · Answer 1

I would use a where exists (or not exists) statement.

Select [column] from a

where exists (select * from b where a.id=b.id)

noeld SSC Guru Points: 96590 More actions · Answer 2

Andreas the code that you posted is exactly what is needed for differences on nullable columns you may skip the is null part for non nullable.

Usually the fastest way to perform the comparison is to use

CHECKSUM_AGG(BINARY_CHECKSUM(*))

like:

SELECT

(SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM t1)

-

(SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM t2) /* 0 means OK*/

* Noel

Andreas Steidle SSC Journeyman Points: 85 More actions · Answer 3

Neol, thanks for the brilliant answer!

The BINARY_CHECKSUM(*) function was exactly what I needed! Let's hope it uses a good hash function so I will never miss any changed rows...

BINARY_CHECKSUM(t1.ColX)<>BINARY_CHECKSUM(t2.ColX)

also seems to do the same as my much longer unequal expression above 🙂

Too bad this function does not directly support a syntax like

WHERE BINARY_CHECKSUM(t1.*)<> BINARY_CHECKSUM(t2.*)

(no tablename allowed, only * or list of column names)

to generically compare all columns from two tables and list all different rows.

So now I am using

WHERE

(select binary_checksum(*) from TableVersion1 where TableVersion1.ID=t1.ID)

<>

(select binary_checksum(*) from TableVersion2 where TableVersion2.ID=t2.ID)

Any shortcut for this?

Thanks

Andreas

Bob Monahon Ten Centuries Points: 1282 More actions · Answer 4

Re: Any shortcut

This might be faster?

SELECT ta.ID as ta_id, ta_check, tb.ID AS tb_id, tb_check

FROM

(SELECT ID, binary_checksum(*) AS ta_check FROM TableVersion1) AS ta

INNER JOIN

(SELECT ID, binary_checksum(*) AS tb_check FROM TableVersion2) AS tb

ON ta.ID = tb.ID

WHERE ta_check <> tb_check

Alternative without the INNER JOIN notation:

SELECT ta.ID as ta_id, ta_check, tb.ID AS tb_id, tb_check

FROM

(SELECT ID, binary_checksum(*) AS ta_check FROM TableVersion1) AS ta

, (SELECT ID, binary_checksum(*) AS tb_check FROM TableVersion2) AS tb

WHERE ta.ID = tb.ID

AND ta_check <> tb_check

Regards,

Bob Monahon

Frank Kalis SSC Guru Points: 111183 More actions · Answer 5

Just a note on the use of a column with BINARY_CHECKSUM instead of * as has been mentioned above. Consider this:

CREATE TABLE #t

(

i1 INT IDENTITY

, c1 VARCHAR(50)

)

INSERT INTO #t VALUES ('a b c')

INSERT INTO #t VALUES ('ab c')

SELECT

(

SELECT BINARY_CHECKSUM(c1) FROM #t WHERE i1 =1)

-

(SELECT BINARY_CHECKSUM(c1) FROM #t WHERE i1 =2)

DROP TABLE #t

-----------

0

(1 row(s) affected)

Seeing the result of 0, one might think both columns are equal

Thanks to Adam Machanic, who originally posted the idea on the newsgroups.

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Olivier Van Rode-198661 SSChasing Mays Points: 655 More actions · Answer 6

I have a similar problem.

We base a calculation performed by a set of stored procs, on a set of input record tables.

Recently, we've been asked to implement the possibility to recalculate only the changed records from a child-set as compared to a parent set (eg say you correct the input records created yesterday, and want to rerun them as today's new set. You then identify today's set as being the child of yesterday's set. The procedure picks up this parenthood, and starts comparing the records from parent and child, to identify those records that were changed).

Using BINARY_CHECKSUM on every meaningful field, returned the strange impression of "yesterday's amount *1000 = yesterday's amount" ??? Apparently, the BINARY_CHECKSUM goodie has problems with NUMERICs.

Anyone willing to do a suggestion on how to deal with this?

Olivier Van Rode-198661 SSChasing Mays Points: 655 More actions · Answer 7

--DROP TABLE t_1

--DROP TABLE t_2

create table t_1

(int1 int, int2 int constraint pk_1 primary key clustered (int1,int2), int3 int)

create table t_2

(int1 int, int2 int constraint pk_2 primary key clustered (int1,int2), int3 NUMERIC(10,3) )

insert into t_1 values (1,1,1000.000)

insert into t_1 values (1,2,100.000)

insert into t_1 values (2,1,500.000)

insert into t_1 values (2,2,500.000)

insert into t_2 values (1,1,100.000)

insert into t_2 values (1,2,1)

insert into t_2 values (2,1,400)

insert into t_2 values (2,2,500)

SELECT BINARY_CHECKSUM( l.int3,l.int2),

binary_checksum(r.int3,r.int2),

CASE WHEN BINARY_CHECKSUM( l.int3,l.int2) <> binary_checksum(r.int3,r.int2)

THEN 'CHANGED'

ELSE 'UNCHANGED'

END, *

from t_2 l full outer join t_1 r on l.int1 = r.int1 AND l.int2 = r.int2

Olivier Van Rode-198661 SSChasing Mays Points: 655 More actions · Answer 8

Notice the nice constatation that 100.000 = 1000, and 1.000 = 100...