April 15, 2010 at 3:46 pm
WayneS (4/15/2010)
bteraberry (4/15/2010)
Manual entry is always a problem, but by extending your constraint over such a large number of fields, the chances of entry errors invalidating the constraint are greatly increased.Do you mean decreased?
No, I mean increased. The more characters there are that are manually typed in, the greater the chance is that one of those characters will be typed incorrect. Thus, the greater the chance that what was intended to be a constraint will be foiled/invalid for actually preventing the duplication.
April 16, 2010 at 2:24 am
This sounds a little like the double-entry used for clinical trials records. The data is entered twice by two different DE ops. When all of the data has been entered (twice), the two sets of data are compared, and where they differ, a tie-breaker process is used.
There are two main processes here. Data entry, where there's no validation of one set against the other - all you want to do is bang in the data without interruption; then reporting, where the data set pairs are compared and discrepancies output. In neither process is a mythical 80-column key required. The safest way to compare each row pair is value by value - is the value of column 3 of set 1 (of patient 1/visit 1) the same as the value of column 3 of set 2 (of patient 1/visit 1)? "Checksum of a row" could be used to quickly identify row pairs with differences, but the mismatched values still have to be highlighted.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 16, 2010 at 5:40 am
Chris Morris-439714 (4/16/2010)
"Checksum of a row" could be used to quickly identify row pairs with differences, but the mismatched values still have to be highlighted.
I guess you threw this in because checksum had been suggested earlier, after you had described the right way of doing it (item by item comparison) but the idea strikes me as wrong and potentially very dangerous in a clinical context. To quote BoL: "For this reason, we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change" (http://msdn.microsoft.com/en-gb/library/ms189788.aspx). BoL recommends using hashbytes instead of checksum, but I would stick to your original description of item-wise checking for this application despite hashbytes (MD5) being vastly better than checksum.
Tom
April 16, 2010 at 5:47 am
Tom.Thomson (4/16/2010)
Chris Morris-439714 (4/16/2010)
"Checksum of a row" could be used to quickly identify row pairs with differences, but the mismatched values still have to be highlighted.I guess you threw this in because checksum had been suggested earlier, after you had described the right way of doing it (item by item comparison) but the idea strikes me as wrong and potentially very dangerous in a clinical context. To quote BoL: "For this reason, we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change" (http://msdn.microsoft.com/en-gb/library/ms189788.aspx). BoL recommends using hashbytes instead of checksum, but I would stick to your original description of item-wise checking for this application despite hashbytes (MD5) being vastly better than checksum.
You're absolutely right, Tom. My clinical trials knowledge is restricted to a year working on the phase 3 trials of Famotidine, with some work on Zoladex and Hexabrix thrown in. This was some time ago, about 1989 (Parexel subsequently became "famous" through the "Elephant man" trial), but if the same rules apply, then it would be column-by-column.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 16, 2010 at 5:54 am
Thanks.. Thats what i was looking for. Can you explain little more about tie breaker process.
April 16, 2010 at 6:07 am
anbillava (4/16/2010)
Thanks.. Thats what i was looking for. Can you explain little more about tie breaker process.
Sure.
Method 1: Enter the same row of data a third time and see if it matches one of the two existing rows. If yes, discard the row which does not match. If no, enter a fourth time.
Method 2: Show both rows together, like 2 rows in a spreadsheet, then use the original document as a reference to visually identify and manually correct mismatched values.
This is a business decision however, and well beyond the scope of a programming forum.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 16, 2010 at 6:53 am
Thanks to all here.... It was a very good discusssion... SQL Server central rocks 🙂
April 16, 2010 at 8:40 am
You're welcome.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
April 18, 2010 at 7:37 pm
Tom.Thomson (4/16/2010)
...I would stick to your original description of item-wise checking for this application despite hashbytes (MD5) being vastly better than checksum.
In defence of HashBytes...
HashBytes can use the MD2, MD4, MD5, SHA, or SHA1 hashing algorithms.
I would probably use HashBytes using SHA1 (salted for the truly paranoid), since column-by-column comparisons can be inefficient and error-prone to code (handling NULLs correctly, for one thing).
It depends though.
April 18, 2010 at 10:00 pm
I still can't believe that someone needs a 70 column primary key. Without actually understanding the table at hand, I believe that some normalization would be in order. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
April 19, 2010 at 6:53 am
Back in the old days of punch card data entry, it was standard practice for each set of data to be processed by two different people, and the data only stored on the computer if both sets of data matched.
Even with this and using trained staff, there was a 0.5% error rate. That means two separate people made the same mistake in typing 80 characters of data on a 1 in 200 basis.
Any form of manual data entry will have errors.
Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.
When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara
April 19, 2010 at 6:07 pm
Paul White NZ (4/18/2010)
In defence of HashBytes...HashBytes can use the MD2, MD4, MD5, SHA, or SHA1 hashing algorithms.
I would probably use HashBytes using SHA1 (salted for the truly paranoid), since column-by-column comparisons can be inefficient and error-prone to code (handling NULLs correctly, for one thing).
It depends though.
Well sure, using SHA1 will be better than MD5 by a long way, at least in terms of the properties of the two algorithms.
However, both SHA1 and MD5 suffer from a shared problem: they will both be wonderfully error-prone to code since the process of converting a set of columns into something you can feed into HashBytes is a bit of an error-prone nightmare.
Tom
April 19, 2010 at 6:23 pm
Jeff Moden (4/18/2010)
I still can't believe that someone needs a 70 column primary key. Without actually understanding the table at hand, I believe that some normalization would be in order. 😉
Anyone who thinks they need 70 columns for the PK is insane. OK, that's a ridiculous prejudicial statement. I still think it's true.
For this particular application we are looking for attempts to inroduce defferent data in non-key colums for a fixed primary key. The problem is trhat we want to alolow such attemnpts, not forbid them. It seems to me that by using two tables one can achieve exactly the required effect in SQLS, and the whole problem arises from trying to do it in one table.
Tom
April 20, 2010 at 2:28 am
Tom.Thomson (4/19/2010)
Jeff Moden (4/18/2010)
I still can't believe that someone needs a 70 column primary key. Without actually understanding the table at hand, I believe that some normalization would be in order. 😉Anyone who thinks they need 70 columns for the PK is insane. OK, that's a ridiculous prejudicial statement. I still think it's true.
For this particular application we are looking for attempts to inroduce defferent data in non-key colums for a fixed primary key. The problem is trhat we want to alolow such attemnpts, not forbid them. It seems to me that by using two tables one can achieve exactly the required effect in SQLS, and the whole problem arises from trying to do it in one table.
If the data were in two separate tables then I guess off-the-shelf tools like SQL Data Compare could be used for the analysis. IIRC we used to enter the same result set of clinical trials data twice to give two separate "tables" in SAS, then use PROC COMPARE(?) to compare the tables. With SQL Server there's no reason why the two sets couldn't coexist in the same table provided that they're identified: record 1, set 1, record 1, set 2 etc.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 21, 2010 at 5:09 am
At the risk of uttering a heresy, is SQL even the best place to do the comparison? Assuming the records are being keyed into some front-end system, might it not be better to do the comparison in there before the second record even gets saved?
If it's an OO system, you could wrap the comparison nicely up into some kind of equals method on the object.
Viewing 15 posts - 31 through 45 (of 46 total)
You must be logged in to reply to this topic. Login to reply