June 17, 2008 at 12:30 pm
Does anyone know how to compare two image fields to find out if they are the same or not?
June 17, 2008 at 3:25 pm
Look at the HashBytes function. The fields will be equal when their checksums are equal.
From BOL:
DECLARE @HashThis varbinary(max);
SELECT @HashThis = CONVERT(varbinary,0x1234123412341123412341223432243123545654345736576234535232452345234);
SELECT HashBytes('MD5', @HashThis);
GO
HTH
Piotr
...and your only reply is slàinte mhath
June 17, 2008 at 3:29 pm
HashBytes isn't guaranteed to be unique, so it's LIKELY they're the same (as in really likely). But you'd still want to double-check anyway (some byte by byte compare once you have likely candidates).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
June 17, 2008 at 3:30 pm
Or simply CAST to varbinary(max) and compare
declare @a table( id int, im image )
declare @b-2 table ( id int, im image )
insert into @a (id, im) values ( 1, cast ( replicate('-',8000) as varchar(max) ) + cast ( replicate('-',8000) as varchar(max) ) )
insert into @a (id, im) values ( 2, cast ( replicate('+',8000) as varchar(max) ) + cast ( replicate('+',8000) as varchar(max) ) )
insert into @b-2 (id, im) values ( 1, cast ( replicate('-',8000) as varchar(max) ) + cast ( replicate('-',7999) as varchar(max) ) + '+' )
insert into @b-2 (id, im) values ( 2, cast ( replicate('+',8000) as varchar(max) ) + cast ( replicate('+',8000) as varchar(max) ) )
select datalength(im) from @a
select datalength(im) from @b-2
select a.id, case when cast(a.im as varbinary(max)) = cast(b.im as varbinary(max))
then 0 else 1 end are_differrent
from @a a inner join @b-2 b on a.id = b.id
* Noel
June 17, 2008 at 3:32 pm
How about using BINARY_CHECKSUM?
June 17, 2008 at 3:33 pm
drodriguez (6/17/2008)
How about using BINARY_CHECKSUM?
FROM BOL:
... BINARY_CHECKSUM ignores columns of noncomparable data types in its computation. Noncomparable data types include text, ntext, image, cursor, xml, and noncomparable common language runtime (CLR) user-defined types
* Noel
June 17, 2008 at 3:55 pm
Right. Shouldn't have been so quick. Direct comparison is better than calculating checksum.
On the other hand, I never heard of a case that two different byte streams produce same checksum. All magnetic WORM storage solutions (like EMC) rely upon this fact. So checksum can be considered as unique.
Piotr
...and your only reply is slàinte mhath
June 17, 2008 at 4:17 pm
One problem with HASHBYTES that might make it of limited use is it doesn't support more than 8000 bytes for input.
If interested, you can vote on this issue at Connect: http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=273429
June 17, 2008 at 4:24 pm
Problem is - Hashbytes doesn't support the large data types, so it's constrained to 8K of data. Even with using a SHA1 hash (much more likely to be unique than MD5), you're not going up against the full binary stream.
Check this out - it's the Connect request to fix said issue....
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=273429
If you really want an "true" hash of the image, you will likely need to roll your own using SQLCLR, so that you can stream in the entire binary of the image.
EDit: that Todd sure is fast! beaten to the punch!:)
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
June 17, 2008 at 4:39 pm
It worked. Thanks for your guys help.
June 17, 2008 at 5:12 pm
I am disappointed. This function should really take longer parameters.
Thanks guys for the info.
Piotr
...and your only reply is slàinte mhath
June 17, 2008 at 7:08 pm
Heh... c'mon folks... if you know one of the images is correct, who cares about the other one? It takes longer to see if they're the same or not even if they turn out to be the same than it does to just replace with the known good image... no comparison required!
--Jeff Moden
Change is inevitable... Change for the better is not.
June 18, 2008 at 9:23 am
The reason for the request is compare with other rows that there aren't any duplicate images.
June 18, 2008 at 10:04 am
cast(a.im as varbinary(max)) = cast(b.im as varbinary(max))
Is not that bad, right ?
* Noel
June 18, 2008 at 10:08 am
drodriguez (6/18/2008)
The reason for the request is compare with other rows that there aren't any duplicate images.
Keep in mind that if the hash values of the two images are NOT the same, then the images are definitely not the same. So - it might be easiest to do a 2-pass scenario.
1. hash all of the binary columns to check. check to see if any of the hash values are equal.
2. do a varbinary comparison like noeld is recommending, but only between any rows with duplicate hash values.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply