March 15, 2012 at 4:18 pm
Hi, I feel I should know this but for the life of me I can't think of an elegant solution...I have 100+ bit datatype columns in two tables and I'd like to know the rows where they are different.
Is there a standard method so I can avoid using multiple OR statements ?
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[mysource]') AND type in (N'U'))
DROP TABLE [dbo].[mysource]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[mytarget]') AND type in (N'U'))
DROP TABLE [dbo].[mytarget]
GO
create table mysource(pk_id int, col1 bit, col2 bit, col3 bit, col103 bit)
go
create table mytarget(pk_id int, col1 bit, col2 bit, col3 bit, col103 bit)
go
insert mysource select 1000,1,0,1,1 union all
select 1001,1,1,0,1 union all
select 1002,1,1,0,1
insert mytarget select 1000,1,0,1,0 union all
select 1001,1,1,0,1 union all
select 1002,1,1,1,0
select * from mysource a
join mytarget b on a.pk_id = b.pk_id
where
a.col1 <> b.col1 or
a.col2 <> b.col2 or
a.col3 <> b.col3 or
a.col103 <> b.col103
March 15, 2012 at 4:31 pm
Whats your desired output? which rows u want to see?
March 15, 2012 at 4:35 pm
oooops.. i dint see your last select statement..
Hmmmm.. i cant think of any other way than individually checking each columns!
March 15, 2012 at 5:00 pm
would this be a good use of Except?
Something like
select * from mytarget
except
select * from mysource
I've not had an opportunity use it myself.
March 15, 2012 at 5:23 pm
Yes, nice try however in this particular case its the except I'm trying to avoid. In addition to the 100+ bit type columns it also has 100+ 'text' datatype columns which proves to be a bit of a problem when running a comparison query across linked servers. We _could_ use except on only those columns we deem significant - if it weren't for the fact that the datatypes are different; 'text' in source is 'varchar(max)' in target, which would mean instead of just coding select col1, col2 except select col1, col2 we would have to cast each column first - which generates more code which I'm trying to avoid.
I guess I was hoping to be able to use some sort of transform based upon a bitwise operator - I'd still have to specify each column but at least I wouldn't have multiple OR lines.
i.e. <some_transform>(a.col1+a.col2+a.col3...+a.col103) <> <some_transform>(b.col1+b.col2+b.col3+b.col103) -- note that the '+' in this statement is not correct, I'm just using it for illustration - it would have to be something else
Its not a big drama - I just thought there would be some standard way to do it.
Edit:
Hmm, I wonder if I could generate a hash of the values ? Will look into this now...
March 15, 2012 at 6:02 pm
OK, for what its worth I've coded it like this...
-- updates
select
a.pk_id
from
mytarget a
join
mysource b
on a.pk_id = b.pk_id
where
(
-- 45 columns with non-significant data in
a.[col1]<>b.[col1]
orcast(b.[col2] as varchar(MAX))<> a.[col2]
...
or binary_checksum(a.[col56],a.[col57]...)
<>
binary_checksum(b.[col56],b.[col57]...)
)
It works and runs in about the same time (which surprised me) but frankly I don't trust it so I'm leaving all the OR statements in.
My last day of work today so don't want to rock the boat on the way out!
March 18, 2012 at 3:20 am
Not sure if this approach would be useful?
This method can also be used to determine which columns are alike.
IF OBJECT_ID('tempdb..#mysourcebitwise') IS NOT NULL DROP TABLE #mysourcebitwise
IF OBJECT_ID('tempdb..#mytargetbitwise') IS NOT NULL DROP TABLE #mytargetbitwise
create table #mysourcebitwise
(pk_id int,
col1 bit,
col2 bit,
col3 bit,
col4 bit,
binstate AS((col1 * POWER(2,3)) +
(col2 * POWER(2,2)) +
(col3 * POWER(2,1)) +
(col4 * POWER(2,0))
) PERSISTED
)
go
create table #mytargetbitwise
(pk_id int,
col1 bit,
col2 bit,
col3 bit,
col4 bit,
binstate AS((col1 * POWER(2,3)) +
(col2 * POWER(2,2)) +
(col3 * POWER(2,1)) +
(col4 * POWER(2,0))
) PERSISTED
)
go
insert #mysourcebitwise
select 1000,1,0,1,1 union all
select 1001,1,1,0,1 union all
select 1002,1,1,0,1
insert #mytargetbitwise
select 1000,0,0,1,0 union all
select 1001,1,1,0,1 union all
select 1002,1,1,1,0
select *
from #mysourcebitwise a
join #mytargetbitwise b on a.pk_id = b.pk_id
where a.binstate != b.binstate
March 18, 2012 at 4:46 pm
Yes, once I'd forced the left hand side to a float...
(col103 * POWER(2e1,102))
... this worked.
I guess at this point I'd be happy enough with the binary_checksum method - I'd accept a false positives (i.e. rows identified as diffs even if they are identical) as long as I could be sure that it wouldn't miss any that were truly diffs...
thanks
🙂
March 19, 2012 at 7:15 am
Tread with caution using binary_checksum. E.g. with 10 bit columns (1024 distinct combinations), there are only 256 distinct binary checksums:
WITH a AS ( SELECT CAST(0 AS BIT) val
UNION ALL
SELECT CAST(1 AS BIT) val
)SELECT
a.val col1 ,
b.val col2 ,
c.val col3 ,
d.val col4 ,
e.val col5 ,
f.val col6 ,
g.val col7 ,
h.val col8 ,
i.val col9 ,
j.val col10 ,
BINARY_CHECKSUM(*) bin_checksum
FROM a ,
a b ,
a c ,
a d ,
a e ,
a f ,
a g ,
a h ,
a i ,
a j
ORDER BY BINARY_CHECKSUM(*)
March 19, 2012 at 5:16 pm
...and it looks as though the situation doesn't get any better no matter how many columns you add you still get 256 distinct values - ah well, leave the OR's in and call it a day.
thanks to all for the input
🙂
March 19, 2012 at 9:55 pm
I don't know if it is applicable here, but this may be a good candidate for CDC, if you are using 2008, to detect if a change has been made and update appropriately.
March 19, 2012 at 10:51 pm
Yes, we had just moved our environment to SQL2K8 (4 weeks ago), the source operational database is not within our area to control so CDC would have to be agreed to by some other part of the business.
Given that under SQL2K5 we have had to roll our own ETL which essentially just versions each row of the ODS then I think CDC or more probably change tracking would certainly be a more maintainable approach.
As I say I don't work there any more so the solution is now moot!
Perhaps I'll be able to use it in my next job?
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply