December 28, 2004 at 12:40 pm
I am try to find unequal rows of two tables t1 and t2 on two different servers.
To find where two columns are unequal I am currently using
t1.Name<>t2.Name
or t1.Name is null and t2.Name is not null
or t1.Name is not null and t2.Name is null
If the columns may be null, does ti really need five expression in standard SQL to do this? Am I to stupid? Is there a special SQL-server function to do this? If a table has some 30 columns this will produce a lot of code...
I know about SQL-Servers SET ANSI_NULLS OFF so we could write
t1.Name<>t2.Name
but this reduces the cases where this statement can be used. However, would it run faster if I do it this way?
Thanks for any hints
Andreas
December 28, 2004 at 1:14 pm
I would use a where exists (or not exists) statement.
Select [column] from
a
where exists (select * from
b where a.id=b.id)
December 28, 2004 at 1:21 pm
Andreas the code that you posted is exactly what is needed for differences on nullable columns you may skip the is null part for non nullable.
Usually the fastest way to perform the comparison is to use
CHECKSUM_AGG(BINARY_CHECKSUM(*))
like:
SELECT
(SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM t1)
-
(SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM t2) /* 0 means OK*/
* Noel
December 28, 2004 at 2:26 pm
Neol, thanks for the brilliant answer!
The BINARY_CHECKSUM(*) function was exactly what I needed! Let's hope it uses a good hash function so I will never miss any changed rows...
BINARY_CHECKSUM(t1.ColX)<>BINARY_CHECKSUM(t2.ColX)
also seems to do the same as my much longer unequal expression above 🙂
Too bad this function does not directly support a syntax like
WHERE BINARY_CHECKSUM(t1.*)<> BINARY_CHECKSUM(t2.*)
(no tablename allowed, only * or list of column names)
to generically compare all columns from two tables and list all different rows.
So now I am using
WHERE
(select binary_checksum(*) from TableVersion1 where TableVersion1.ID=t1.ID)
<>
(select binary_checksum(*) from TableVersion2 where TableVersion2.ID=t2.ID)
Any shortcut for this?
Thanks
Andreas
December 29, 2004 at 7:41 am
Re: Any shortcut
This might be faster?
SELECT ta.ID as ta_id, ta_check, tb.ID AS tb_id, tb_check
FROM
(SELECT ID, binary_checksum(*) AS ta_check FROM TableVersion1) AS ta
INNER JOIN
(SELECT ID, binary_checksum(*) AS tb_check FROM TableVersion2) AS tb
ON ta.ID = tb.ID
WHERE ta_check <> tb_check
Alternative without the INNER JOIN notation:
SELECT ta.ID as ta_id, ta_check, tb.ID AS tb_id, tb_check
FROM
(SELECT ID, binary_checksum(*) AS ta_check FROM TableVersion1) AS ta
, (SELECT ID, binary_checksum(*) AS tb_check FROM TableVersion2) AS tb
WHERE ta.ID = tb.ID
AND ta_check <> tb_check
Bob Monahon
December 29, 2004 at 7:53 am
Just a note on the use of a column with BINARY_CHECKSUM instead of * as has been mentioned above. Consider this:
CREATE TABLE #t
(
i1 INT IDENTITY
, c1 VARCHAR(50)
)
INSERT INTO #t VALUES ('a b c')
INSERT INTO #t VALUES ('ab c')
SELECT
(
SELECT BINARY_CHECKSUM(c1) FROM #t WHERE i1 =1)
-
(SELECT BINARY_CHECKSUM(c1) FROM #t WHERE i1 =2)
DROP TABLE #t
-----------
0
(1 row(s) affected)
Seeing the result of 0, one might think both columns are equal
Thanks to Adam Machanic, who originally posted the idea on the newsgroups.
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
June 13, 2005 at 8:36 am
I have a similar problem.
We base a calculation performed by a set of stored procs, on a set of input record tables.
Recently, we've been asked to implement the possibility to recalculate only the changed records from a child-set as compared to a parent set (eg say you correct the input records created yesterday, and want to rerun them as today's new set. You then identify today's set as being the child of yesterday's set. The procedure picks up this parenthood, and starts comparing the records from parent and child, to identify those records that were changed).
Using BINARY_CHECKSUM on every meaningful field, returned the strange impression of "yesterday's amount *1000 = yesterday's amount" ??? Apparently, the BINARY_CHECKSUM goodie has problems with NUMERICs.
Anyone willing to do a suggestion on how to deal with this?
June 13, 2005 at 8:45 am
--DROP TABLE t_1
--DROP TABLE t_2
create table t_1
(int1 int, int2 int constraint pk_1 primary key clustered (int1,int2), int3 int)
create table t_2
(int1 int, int2 int constraint pk_2 primary key clustered (int1,int2), int3 NUMERIC(10,3) )
insert into t_1 values (1,1,1000.000)
insert into t_1 values (1,2,100.000)
insert into t_1 values (2,1,500.000)
insert into t_1 values (2,2,500.000)
insert into t_2 values (1,1,100.000)
insert into t_2 values (1,2,1)
insert into t_2 values (2,1,400)
insert into t_2 values (2,2,500)
SELECT BINARY_CHECKSUM( l.int3,l.int2),
binary_checksum(r.int3,r.int2),
CASE WHEN BINARY_CHECKSUM( l.int3,l.int2) <> binary_checksum(r.int3,r.int2)
THEN 'CHANGED'
ELSE 'UNCHANGED'
END, *
from t_2 l full outer join t_1 r on l.int1 = r.int1 AND l.int2 = r.int2
June 13, 2005 at 8:50 am
Notice the nice constatation that 100.000 = 1000, and 1.000 = 100...
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply