Log Updates To Table

  • I've a user table that has four fields that need to be manually verified (as a daily batch) by an administrator any time they're changed.  The four fields may be changed over the course of a couple of different web pages.  The table does have a 'modified' column that records the date the last change occurred on a given row.

    If a change is made to one (or more) of those 4 fields I want to write a log record to a log table.  The log table would then be manually verified by the administrator and a pass/fail value assigned.  The goal is to keep a record of every unique set of values those 4 columns contain over the life of the record and whether or not those values pass/fail as a group.

    Strategy One

    Initially I though a trigger would be a good fit.  It could write those four fields to the log table any time one of them changed.  My fear here is that I'd end up with duplicates - that is, if field1 was changed an entry would be written and then another if field2 was changed right after when I only want the admin to have to check once all changes are completed for a given day.

    Strategy Two

    I then considered  having a query run each night that would check for any records modified that day and then write to the log table when the record with those four values don't exist in the log table.  This would avoid duplicates (for example, field1 changed back to a previously checked value).  This doesn't seem as efficient as I might want in that I may be checking several hundred records against several thousands looking for duplicates.

    Anyways, I'm open to suggestions.  There may be SQL features of which I'm unaware.

  • My opinion - it sounds like strategy one is the way to go.  This lets you log all changes and you can select the changes you need to see.  What I mean is, use your SELECT query and some fancy TSQL to build a view on the audit table so you only get the data you want.  You will have duplicates in the table such as given your example where field 1 changes then field 2 changes at a later time, but you can filter that out in your SELECT query.  Plus it allows you to view the data at any point in the day.

    The problem with option 2 is you need to wait for the job to run to pull the data.  And if the job runs at midnight (for example) and at 1:00 AM someone changes field 1 from "foo" to "bar" and then at 11:00 PM changes it from "bar" back to "foo", your daily job would completely miss that the value had changed (although this may be the desired effect?).

    The nice thing about the audit table is you can see exactly when the value changed, what it changed from, what it changed to, who changed it, and you can pull the data as you see fit.  You want to know what the value was at the start of the day and at the current date/time?  You can do that.  You want to know the value from a month ago to today? you can do that.  You want to know how many times the value changes per day/week/month/year? you can do that.  All using  scenario one.  If you use scenario two, you can only check what the value was when the job ran.

    The disadvantage to the trigger approach is that triggers are "hidden maintenance".  What I mean is they are EASY to forget they exist and easy to forget that you MAY need to tune them to prevent performance issues down the road.  You build a trigger that works great inserting into an empty table and has basically no performance hit, but you don't realize your clustered index is not designed for how the trigger is inserting data, when you hit 100,000 rows in the audit table (for example), you may end up with tons of scans and page splits and see huge performance hits.  And when those happen, it is easy to jump into a stored procedure and start digging into why it happened or even look at the base table and try to figure out why CRUD operations are suddenly slow and completely forget that a trigger exists.

    OR you end up with a poorly coded trigger and instead of logging a single row when something changes, you make a copy of the table into your audit table.  We had a developer do that before I started as a DBA at my current job. Rather than using the INSERTED and DELETED tables, they used the source table.  So every INSERT on the table resulted in the base table getting copied into the audit table.  Pretty quickly that audit table was using up 100 GB when the base table was only a few MB.  Easy fix updating the trigger, but pain in the butt trying to determine what was happening and why.

    Another option (can't remember if 2016 has it in standard or enterprise only) is CDC - change data capture.  May or may not be what you are looking for.

    TL;DR - I like strategy one as you have all the data in real time that you can monitor, manipulate, and present as you see fit.  Strategy Two results in potential for missed changes in the event that the data is changed multiple times in a single day and ends up back at the original value by the time the job runs.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • I too vote for One.

    I assume the table has a unique key column(s).  If not, it will be very difficult to do what you want.

    Your stated requirements seem to contradict themselves.

    "The goal is to keep a record of every unique set of values those 4 columns contain over the life of the record and whether or not those values pass/fail as a group."

    vs.

    "I only want the admin to have to check once all changes are completed for a given day."

    If you need to the first of those, you will need to have someone check every entry because the pass/fail apparently has to be done by a human.  Btw, is that really true?  Are they no rules you could use to have code grade the pass/fail status instead of having to have a person review the data?

    If the second, a query(ies) can consolidate the changes and show only the final result to the person to grade the pass/fail.  Don't get trapped into thinking that just because the raw data has multiple rows that all of them must be shown to the person doing the grading.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • thelenj wrote:

    Anyways, I'm open to suggestions.  There may be SQL features of which I'm unaware.

    My question would be... Does the reviewer need to know which columns actually changed?  Do they need to see the Before'n'After values of each column or, since the table isn't that wide, just present the last content of the rows that changed from the day before and the current content of the rows that changed?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you for helping me think through this - the pros and cons presented lined up well with my thoughts on the matter.  I'm all set.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply