December 8, 2017 at 8:05 pm
odeddror - Friday, December 8, 2017 6:11 PMTomThomson - Friday, December 8, 2017 4:03 PMJeff Moden - Wednesday, December 6, 2017 8:52 PMI'll add that if HASHBYTES is different between any two rows, then the rows are guaranteed to be different but, if they are the same,
there's not necessarily a guarantee that the rows are the same. You need to verify that rows with the same hashing are, in fact, the
same. I agree that the chances are very slim that they're not the same but you've already found out that "slim" doesn't mean "never". 😉Depending on how many different rows are possible and how many different rows you actually have, "slim" may be quite close to "never" or a long way away from it. Roughly speaking, the chances of two different rows picked at random have the same hash are about 1 in 1 000000 000000 000000 000000 000000 000000 000000 000000 (1 in 10**48), but if you've got 3 million different rows picked at random the chance that there's a pair that have the same hash is 9 trillion times that (so about 1 in 10**35) which is a lot further from zero.
Jeff,
Thank you
Oded Dror
Yep... agreed BUT the OP had collisions. And, even if collisions are near "never", it still only "near" never. If you have two or more hashes that are identical and no matter how slim the possibility, the right thing to do is still check using another method to ensure that they actually are identical.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 8, 2017 at 9:38 pm
Jeff Moden - Friday, December 8, 2017 8:05 PModeddror - Friday, December 8, 2017 6:11 PMTomThomson - Friday, December 8, 2017 4:03 PMJeff Moden - Wednesday, December 6, 2017 8:52 PMI'll add that if HASHBYTES is different between any two rows, then the rows are guaranteed to be different but, if they are the same,
there's not necessarily a guarantee that the rows are the same. You need to verify that rows with the same hashing are, in fact, the
same. I agree that the chances are very slim that they're not the same but you've already found out that "slim" doesn't mean "never". 😉Depending on how many different rows are possible and how many different rows you actually have, "slim" may be quite close to "never" or a long way away from it. Roughly speaking, the chances of two different rows picked at random have the same hash are about 1 in 1 000000 000000 000000 000000 000000 000000 000000 000000 (1 in 10**48), but if you've got 3 million different rows picked at random the chance that there's a pair that have the same hash is 9 trillion times that (so about 1 in 10**35) which is a lot further from zero.
Jeff,
Thank you
Oded DrorYep... agreed BUT the OP had collisions. And, even if collisions are near "never", it still only "near" never. If you have two or more hashes that are identical and no matter how slim the possibility, the right thing to do is still check using another method to ensure that they actually are identical.
Jeff,
I will thank you
Oded Dror
December 9, 2017 at 4:31 pm
Jeff Moden - Friday, December 8, 2017 8:05 PMYep... agreed BUT the OP had collisions. And, even if collisions are near "never", it still only "near" never. If you have two or more hashes that are identical and no matter how slim the possibility, the right thing to do is still check using another method to ensure that they actually are identical.
Yes, he has enough collisions to make me believe there's something strange going on. Assuming he's running 3 million or fewer new rows against his 3 million existing rows the chances of getting 7 or 8 collisions are so small that if he's gets that many even once in a thousand runs we can be rather certain that something very odd is going on - possibly something like what Nadrek suggested a couple of days ago.
And the only way to be sure that two things are identical is to compare them directly. Hash comparisons are nothing other than a way to reduce the number of direct full comparisons needed because hash comparisons discover most of the non-matches cheaply. They certainly are not something that can prove a match, so when the hashes match you do a full compare.
Tom
December 9, 2017 at 9:35 pm
TomThomson - Saturday, December 9, 2017 4:31 PMJeff Moden - Friday, December 8, 2017 8:05 PMYep... agreed BUT the OP had collisions. And, even if collisions are near "never", it still only "near" never. If you have two or more hashes that are identical and no matter how slim the possibility, the right thing to do is still check using another method to ensure that they actually are identical.Yes, he has enough collisions to make me believe there's something strange going on. Assuming he's running 3 million or fewer new rows against his 3 million existing rows the chances of getting 7 or 8 collisions are so small that if he's gets that many even once in a thousand runs we can be rather certain that something very odd is going on - possibly something like what Nadrek suggested a couple of days ago.
And the only way to be sure that two things are identical is to compare them directly. Hash comparisons are nothing other than a way to reduce the number of direct full comparisons needed because hash comparisons discover most of the non-matches cheaply. They certainly are not something that can prove a match, so when the hashes match you do a full compare.
I totally agree that the number of collisions the OP has are a bit insane. I've not seen so many even with the old MDF algorithm. Thanks for the feedback, Tom.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 4 posts - 106 through 108 (of 108 total)
You must be logged in to reply to this topic. Login to reply