How to detect changes in the data using T-SQL

  • Hi All

    Does anybody know how to detect changes in the data using T-SQL but not with triggers.

    I have a table that needs to be updated whenever any kind of data is updated, deleted or inserted in any table. This table will indicate whether the data has changed and thus be flagged for a backup. The only problem is that there are a hundred users writing to the database at any given time and using triggers can create a large amount of overhead which I want to avoid.  

  • We did something once that might satisfy you, though honestly I'm not sure that your approach isn't better served with triggers, at least setting an update date/time in the sale table.

    What we needed was to track usage in a "sandbox" where people could create their own tables.  We did not have control of the schema so we couldn't depend on triggers, and we wanted to recognize stale tables (ones created and forgotten).  So we weekly run a script which uses the CHECKSUM_AGG(BINARY_CHECKSUM(*)) functions, and save such a checksum for each table in a modifications table.  The following week we compare a new calculation against prior week to see if anything changed and thus we know which tables have changed.

    It is fast and easy.  It is not 100% accurate.  We found the checksum used is mediocre.  Certain changes can produce the same checksum far more frequently than one would expect (well, to be precise, with a 32 bit checksum I expected it to be virtually never and we found a couple cases so far, so it is not bad).

    Knowing nothing about your application I hesitate to say this, but I've rarely seen cases where backing up individual tables really made sense in the long run.  Are you sure you are solving a problem you need to solve, backing up tables triggered by changes, vs. just backing up the entire database?  How hard would it be to put back the entire database if you lost it, from lots of pieces?

  • I agree. I think triggers is the way to go. I don't see why it would create a lot of overhead if you use the triggers wisely. In my important tables where I need to know if and when modifications have been made, I use a datetime field. It has a default of getdate() so that inserts will have a datetime. Then it has an update trigger that updates that field with getdate() if it was not updated already by the update query.

    No detectable overhead at all.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • Thanks for the replies. I would just like to ask RawHide if you could perhaps give me some more details on using the datetime for data change notification. I understand that everytime the data changes the datetime field will be udpated but I only need to be notified once. Let me explain my whole situation for better clarity.

    I have a database(actually about 150) each containing about 250 tables. These databases are updated frequently at different times. I am using log shipping to update my databases. I have another database which has a table used to indicated any changes made to the data in the database. The table has two columns called databasename and active. If there are any chages made to any data in any database the active column is set to 'Y' for the database that is changed. Now can you imagine a trigger on every table in every database to update this table to show active 'Y'. I need this notification to happen once and once only util the active column can be reset to 'N'. I though about disabling the triggers in the database but you cannot do this from within a trigger. Even if I create a stored procedure to disable the triggers I cannot execute the stored procedure from the trigger.

    Any ideas will be greatly appreciated.

    Thanks

  • Well, my thinking was that you could query the field to see if the highest date in any of the tables is higher than the last time you did a backup.

    Have you already tried it with the triggers? If so, then the overhead that you're seeing may be because you have so many tables and databases that could be potentially trying to update the same table in your tracking db. Updates, of course, lock the table.

    Are deletes a possibility too? If not, here is a similar option that may work for you. Instead of using a field with a trigger that updates the datetime every time something happens, add a timestamp column to each table. Timestamps are binary numbers that are guaranteed to be unique within the database. A timestamp is automatically updated any time a record with the timestamp field is inserted or deleted.

    Then when it comes time to determine which db's to backup, you can check the @@dbts of each database to see if it has been updated. You would need to track the timestamp for when you last backed up teh db.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • The only real way to automate this is with a trigger. But the trigger can be really simple and update the same table as mentioned above, then load your "backup" table just prior to the backup by querying each table as mentioned above.

    There are products to read the log and such, but I'm not sure how you would query them from your backup system (like Log Explorer). You could use Profiler, but I think the overhead here would be higher than than of the other method.

  • Hi all,

    speaking of timestamps in the database, is it advisable to create createtimestamps and update timestamps on all tables of the database? We have one database here that is problematic and we are trying to re-model.  One of the database architects I am working with is trying to insist about adding those timestamps column to ALL tables in the database to monitor the changes.  I am skeptic about it.  I am thinking that not all tables in the database needs to be monitored like that.

    What do the experts can say about this?

    Thanks.

    Lhot

  • We had someone do a similar thing at work and it caused, on the average, of 624 deadlocks a day because they used UPDATE instead of INSERT.  And we only have about a 100 users.  Then, they caused "block outages" for several minutes several times a day while they processed this highly transactional target table.  Be careful of how you approach the problem of having hundreds of users hammering a small number of rows in a single table whether you use triggers, or not.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I would say that you should only add it if you think it needs to be monitored or if the time stamp data would be important. For example, I wouldn't add those fields to a static lookup table.

    To what Jeff posted, all I can say is WOW!!! There should be as close to zero deadlocks per day as possible. Things may happen ocassionally, but an average day, there are no deadlocks in my database and very few delays because of regular locking.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • Yup, once I identified the problem and fixed it, we went back to zero deadlocks.  Just wanted everyone to know that this type of thing, if done incorrectly, can really put a pinch on Trace 1204...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yeah, I'm glad you mentioned that!! It's something people tend to not think about and then they wonder why it worked in development or test (with 1 user) and doesn't work in production.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • guys, thank you so much for the input...  these are all valuable to me.  Jeff, I am interested about Trace 1204.  Can you give me more idea on how updates on those tables with timestamps can cause problem on Trace 1204?

    Again, thank you for all your reliable input.

  • Sure, Marilou,

     

    To make a really long story shorter, if you have triggers, procs, whatever, all vying for the same row to update in a tracking table as you suggest, the chances of deadlocks increases at a logarithmic rate especially if BEGIN TRAN/COMMIT pairs are used in code that updates more than one table at a time.

     

    I had a similar problem at work where some third party geniuses didn’t know about the IDENTITY property and made a sequence table, like in Oracle, instead.  Again, long story short, instead of rewriting a couple hundred user procs, I rewrote the routines that would get a NextID from the sequence table so that inserts occurred in a separate table (couldn't change theirs)instead of updates by dozens of users/procs on a single row.

     

    It sounds as though your tracking table could cause the same type of problem under the load of many users.  I would do inserts to the tracking table instead of updates to avoid the inevitable deadlocks that will be cause by using updates, in this case.  Once the backup occurs, truncate the tracking table instead of deleting from it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • pmfji, but there are a few things I'm not quite clear about..

    What is the end purpose of all this? It seems to me much like reinventing the wheel, by triggers determine if something should be backed up or not...? If this is the end goal, knowing if I should backup db x or not, then my suggestion is to drop the idea entirely. It's not going to work in the long run, and has the potential of being an administrative nightmare.

    Just backup each db like 'regular', and be done with it - everything else is bound to break.

    /Kenneth

  • Finally, the voice of reason.  I agree with Kenneth 100%.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply