May 23, 2011 at 2:12 pm
HansB,
BTW I was thinking more about your post, and it also occurs to me that if you have a datetime column maintained by triggers, then you could simply use the query: [SELECT MAX(<datetimecol>) FROM Table] as your algorithm for detecting changes to the table, rather than bothering to use a CHECKSUM based approach.
Best regards,
Lawrence.
May 23, 2011 at 2:41 pm
In 2008 and 2008 R2 (might only be enterprise), there is the change data capture configuration that could also be put in place.
http://msdn.microsoft.com/en-us/library/bb522489.aspx
Regards,
Steve
May 23, 2011 at 2:49 pm
Lawrence Moore (5/23/2011)
Hi Eric,Many thanks for your post.
It is true that DMVs offer lots of useful information, some of which could be applied for requirements discussed in my article.
However, DMVs typically require elevated user permissions, such as VIEW SERVER STATE.
Regards,
Lawrence
I'd think that a process, which does something like querying for changes in a table and selecting the rows out to another table, would not be running under an application or user account. It would probably the SSIS or the SQL Agent account, in which case it would be OK to grant it VIEW SERVER STATE permission.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
May 23, 2011 at 3:05 pm
Lawrence/Eric:
You guys are right. Thanks for pointing that out to me.
May 23, 2011 at 6:34 pm
Although you may use better methods in SQL Server 2008 now, there are many companies which still run MSSQL 2000/2005 and I appreciate the time you took to write this for us.
Thank you.
Best regards,
Andre Guerreiro Neto
Database Analyst
http://www.softplan.com.br
MCITPx1/MCTSx2/MCSE/MCSA
May 23, 2011 at 6:59 pm
Lawrence.
It always goes back to the same question, what do we need to accomplish? In some cases CHECKSUM might be a valid option, (I am having a hard time coming up with an example in its favor, but that has more to do with the index scan than the symmetric update issue.) but there might be scenario's where this is a valid solution, as triggers might do the trick for (probably most) other people.
I was merely pointing out that the standard audit columns would alleviate the symmetric update problem, I for one still would use triggers.
Again, I do appreciate you taking the time you took, and sharing your stuff with us!
Hans
May 23, 2011 at 10:02 pm
I think if we add one more column to CheckSumTest Table - LstChgDate DateTime, and we update it every operation, insert, update etc.
The CheckSum_Agg() will also return diff value when symmetric update.:-P
May 24, 2011 at 8:18 am
1) Index usage DMV's are emptied when SQL Server is restarted http://msdn.microsoft.com/en-us/library/ms188755.aspx
2) When properly, precisely done checksums (all columns fixed width or very, very careful delimiting (CHAR(31), perhaps), COALESCE/ISNULLed to an impossible value, etc.) can tell you:
2a) If two rows are different
2b) If you need to check further into whether two rows are the same or not; regardless of the theoretical collision chances based on statistical assumptions, I've seen many reports of actual collisions, particularly since real (unencrypted) data very rarely looks random; you can get "Bobs" "cars" and "Bob" "scars" easily enough (see prior comments about properly and precisely done).
3) Lastupdate type columns yields false "the data changed" results if a table was rebuild even when all data except the lastupdate column is 100% identical.
4) For full comparisons, I prefer something like:
WHERE (
((Old.[field1] <> New.[field1]) OR ((Old.[field1] IS NULL) AND (New.[field1] IS NOT NULL)) OR ((Old.[field1] IS NOT NULL) AND (New.[field1] IS NULL)))
OR ((Old.[field2] <> New.[field2]) OR ((Old.[field2] IS NULL) AND (New.[field2] IS NOT NULL)) OR ((Old.[field2] IS NOT NULL) AND (New.[field2] IS NULL)))
)
Note that for tables with many columns, having an automated way to build these statements is much better than copy/paste/modify.
For anyone that insists on using checksums, I would recommend a scheduled process (weekly, monthly, quarterly) that does a thorough check for collisions. Gamblers hope for good look... we should check regularly for bad luck.
May 24, 2011 at 12:09 pm
If you understand what a checksum is then you know whether it is appropriate to use it. It is a type of hash, so sometimes it is appropriate and sometimes it isn't.
IMHO the business rules will dictate the usefulness rather than the collision probability.
I think it would be appropriate for BOL to point that out and either provide a reference to additional information or (2) explain how the aggregate is computed.
Good article though, created a great discussion, I'm giving you a 5 😎
May 24, 2011 at 12:22 pm
If we add a timestamp/rowversion column will the CHECKSUM_AGG() not always give accurate results?
May 24, 2011 at 3:55 pm
I'm looking for the most eficcient way to audit the changes on a particular table. Generaly you need to know which table, row and column where changed. I mean the option is triggers but expend very much time to create one by one on each table. Also we need to check inserted against deleted temp row column by column...
Does exist another way?
May 25, 2011 at 3:25 am
fmendes (5/24/2011)
If we add a timestamp/rowversion column will the CHECKSUM_AGG() not always give accurate results?
Yes I believe you are right, but this raises the question of what value the CHECKSUM_AGG() is bringing to the party, when you can simply use [SELECT MAX(rowversion) from TABLE] as your "has the table changed algorithm", which could be more efficient with an appropriate index.
Best regards,
Lawrence
May 25, 2011 at 3:28 am
arsinfor (5/24/2011)
I'm looking for the most eficcient way to audit the changes on a particular table. Generaly you need to know which table, row and column where changed. I mean the option is triggers but expend very much time to create one by one on each table. Also we need to check inserted against deleted temp row column by column...Does exist another way?
Hi arsinfor,
If you are using SQL 2008, then I believe the new change tracking features could be useful given your requirements.
http://msdn.microsoft.com/en-us/library/cc280462.aspx
However, once you start to talk about custom logic based around inserted/deleted rows, then I think you are in trigger territory.
Hope this helps,
Lawrence
May 25, 2011 at 11:44 am
fmendes (5/24/2011)
If we add a timestamp/rowversion column will the CHECKSUM_AGG() not always give accurate results?
Nope, not "always".
Yes, it will (almost always) fix the example given for 'symmetric change' (I think he called it)
But, it is a hash - so there are collisions.
A hash on a table that could contain millions of rows and a couple hundred columns is condensed (hashed) down to a single scalar value which is the hash value. So, logically there are some hash values that have more than one source.
(There are hashes which are 'perfect hashes' but those are restricted to a specific set of data and the source data must be known in advance. Not possible in this discussion)
HTH,
-Chris C.
May 25, 2011 at 2:22 pm
Thanks very much Lawrence.
I am using sql 2005 and I find in this address a possible solution.....
Viewing 15 posts - 16 through 30 (of 30 total)
You must be logged in to reply to this topic. Login to reply