September 26, 2015 at 12:39 am
John Hanrahan (9/25/2015)
I trap the errors but then display good info. Our users are taught to come get one of us so we can resolve it right then and there.
Ah 🙂 I remember the days when we wrote desktop APPs and strolled round to a user with a problem looking over their shoulder whilst they explained chapter-and-verse what they did that triggered the issue ....
.... nowadays all our stuff is web based for General Public use and we never get to see them, and very rarely hear from them if they have a problem. They most probably Click-Off and give their business to a competitor, so my Log File is intended to try to give me the same experience as I used to get standing alongside someone's desk back in the Good Old Days!
We stop short of capturing a screen-shot of what they are doing (too much disk space for too infrequent a need), but the tracking we have enables me, usually, to recreate exactly what steps they took and what data they saw. The trick is in deciding that their session resulted in failure, from their PoV, and deciding to investigate it. Any user that takes the trouble to send Support a snotty "Your site is rubbish" email is worth their weight in gold as an alert of a potential issue. Sadly 99% of those turn out to be pilot-error 🙁 although we do try to reduce the incidence of those when we discover that is happening.
September 26, 2015 at 12:44 am
Mickey Stuewe (9/25/2015)
7. Don't keep the audit log in your transaction database. Make a separate database for your logs. This will keep the backup and restore times shorter. It will also keep your transaction database backups smaller.
Not sure about this. I think?? that we would need Full Recovery Model on the Audit database, we would also have the additional hassle (as I see it??) of making sure that any Restore was synchronised, so both databases would have to be restored using STOP AT set to the same time.
Seems to me? that disk space and disk I/O would be the same? A purge on the Audit Tables is going to block similarly whether they are separate or all-in-one.
Hence I'm not seeing a benefit, but I'm probably overlooking something obvious!
September 26, 2015 at 10:03 am
At one point I would have taken variations and amalgamations of what has been written here.
Now I would question why logging needs to be in the DB at all if it is queried on a sometimes maybe never basis?
Why would I squander expensive storage IO, backups,DR routines etc without an explicit business case to do so?
I'm not saying don't log just don't make a DB a first option.
In the event of wanting to query the logs why not agree an SLA preferably one that doesn't set the expectation of instant access.
Why do you need two phase commit and resilience on the log file? If you are trying to track down errors then the chances are that the mystery error will crop up again shortly and if it doesn't then what makes the error so important?
Even stuff that is supposed to be logged for regulatory reasons is worth running by your legal people. You may find that you have made a rod for your own back and are implementing something more comprehensive and expensive than was intended or was asked for.
Let us suppose that logging is to a time-stamped file rather than a DB. Keep it local for as short a time as practical, archive it at the first opportunity, only BULK INSERT it into the DB if you must. Hint Logstash, Elastic Search and Kibana might be a better option
September 27, 2015 at 2:57 am
David.Poole (9/26/2015)
In the event of wanting to query the logs why not agree an SLA preferably one that doesn't set the expectation of instant access.
Because in the example I gave "Marketing emails n00,000 people with some special offer " which then goes sour and then the company is losing money, and probably good customers too, until the problem is fixed. Another time we are really pro-active in real time is just after the launch of a new version. We put a lot of effort into QA before launch, but if errors get through that we want to be sure we haven Found & Fixed them really quickly. Of course we could adopt a different Logging strategy at such a time - but I would be against that because of the added complexity / one more thing to go wrong etc.
But in the main I agree we could examine logs after-the-fact, and usually do. Trouble is when something unexpected happens we are expected, and want to, fix it promptly.
Why do you need two phase commit and resilience on the log file? If you are trying to track down errors then the chances are that the mystery error will crop up again shortly and if it doesn't then what makes the error so important?
You are quite right that we don't need all the luxury of logging etc. but I don't think the fact that the error may crop up again helps - we still need to find one instance (which is solvable 🙂 ) and resolve it. One instance may provide the knowledge that it happened but not the Why. Frequency of occurrence is important - a "once" error is low priority to fix, but we need the Stats to know. We may not be able to reproduce it easily / at all in-house. In such a case it often takes numerous logged instances to find the Why - e.g. where a user has logged on from another machine / different browser and has/has not had the same error thereby giving us a comparison from which to deduce the cause.
Let us suppose that logging is to a time-stamped file rather than a DB
The main problem I perceive with that is that the log file is time-based rather than session-based, so we are going to have to import it, and later delete it, in order to be able to analyse it in the order we want. I also can't get (can I?) a real-time indication of the count & type of errors. We report realtime on error types, we have lots of "just in case useful" errors being logged, which we suppress from the stats, but the moment something off-field Spikes in the stats we get an alert.
We could output to files, move them to "cheaper" storage and bulk import them (indeed, we could do that in "close to real time") but we still need to be able to see what the associated Customer / Order / Product status is which might be causing the problem. i.e. we need the whole database. It seems to me that it would be a lot of work / complexity replicating all that data just so that we don't bloat the Production database with the Log data.
Hence in our case, way-back-when, we took the decision to do this all within the Production database. I would definitely consider a more lightweight approach though, but I'd be loath to lose the real-time abilities we have.
Our databases are modest size (tens of GB) so maybe it is a right-sized solution for us that would not be the case for bigger outfits. Even so, 50% or more of the database size will be our logging data, and that is with a retention period of only 7 days.
September 27, 2015 at 5:14 am
Kristen, it sounds like you chose the right thing for your company, your use case and the size of problem.
There are no absolutes. The epiphany I had was when I was taught the five whys drilling down deeper until the ACTUAL true problem is revealed. The trick is to recognise when you are solving a problem or just a symptom of a problem.
Because humans are problem solving animals they tend to present their requirements in terms of solutions to the symptoms rather than the actual problem theyare trying to fix or even down to the root cause.
I know of one chap who used to categorise and report on errors in a log file as he saw it as vital information to help improve the systems generating those errors. In the end he left the company because the people who could authorise the fix just didn't bear any of the pain for it NOT being fixed.
Another symptom I have seen is an extensive logging system in the DB because the calling application was built for the perfect user who could be trusted to do what the designers intended. Had the system been built to consider that people might do a wide range of bizarre things then the system itself would have better served the customers needs rather than keeling over and bombing the DB with error messages
September 27, 2015 at 6:38 am
David.Poole (9/27/2015)
they tend to present their requirements in terms of solutions to the symptoms rather than the actual problem they are trying to fix
Very true 🙂 If I had a $1 for every post on the forums I have seen a post asking how to build the O/P's chosen solution to the problem, rather than detailing the problem and asking for suitable solutions 😉
the people who could authorise the fix just didn't bear any of the pain for it NOT being fixed
...
an extensive logging system in the DB because the calling application was built for the perfect user who could be trusted to do what the designers intended
Reminds me of another error log type we have - what we call Data Validation Errors. This is where we have an overnight bulk data import from a 3rd party database and no interactive way to alert users to any goofy data encountered. Only fix for the goofy data is for the Client to fix the source database, so we figure to just report it. We have a table that logs the issues (some are critical and prevent import off that record, some are warnings where we think the data should be cleaned up - email address contaning multiple "@" for example.)
We have a client with n0,000 such logged errors. It beggars belief to me. We have drawn management's attention to the problem, they were aghast that their data was so shoddy as they are living & dying by the decisions they take, daily, based on it. In fact we've drawn management's attention to it several times, and they have been aghast several times! The n0,000 number now is bigger than it used to be ... and the "recent" errors (if that was all that anyone bothered to fix) is bigger than I think is sensible. The actual worker-bees don't seem to care to pay any attention to our Data Validation Errors Report - which I thought was Some Of My Best Work :crying:
What can you do? :angry:
September 28, 2015 at 9:34 am
Thanks for this editorial, Steve.
It would be great if some form of detailed auditing were available as a basic part of an RDBMS. It is such a crucial function that it seems odd to me that it has to be added on by individual DBAs or developers. I realize most databases have their own unique structures, but I don't think that is a good reason not to have a setting that allows logging of data and/or schema changes to a separate utility database if that feature is enabled for the db. I guess Change Tracking and CDC do this, but I am reading about a lot of caveats and limitations. Why is this so hard to do?
- webrunner
-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html
September 29, 2015 at 4:33 pm
I use a trigger that I modified from Pop Rivett's column (https://www.simple-talk.com/content/article.aspx?article=301#forum from 2006. Put the trigger on any table to be audited and you get one record per field changed. I'm not sure how it will work under real load, hopefully it'll be OK. Since we're dealing with student data and some medical and related info, we want the ability to see changes over time and to be able to undo intentional corruption if a user decides to screw with the system. Multiple tables all go in to the same audit file which gives me a solid timeline if something goes bad, and the trigger has code for me ignoring fields or certain data types if I so desire.
The audit table looks like this:
TypeTableNamePK FieldNameOldValueNewValueUpdateDateUserNameEntryNum
ICaseLoad<StudentID=15><SeekerUserID=42>StudentIDNULL152015-09-29 11:21:57.6830000mydomain\waynewest16400
ICaseLoad<StudentID=15><SeekerUserID=42>SeekerUserIDNULL422015-09-29 11:21:57.6830000mydomain\waynewest16399
ICaseLoad<StudentID=14><SeekerUserID=42>StudentIDNULL142015-09-29 11:21:55.9070000mydomain\waynewest16398
ICaseLoad<StudentID=14><SeekerUserID=42>SeekerUserIDNULL422015-09-29 11:21:55.9070000mydomain\waynewest16397
ICaseLoad<StudentID=13><SeekerUserID=42>StudentIDNULL132015-09-29 11:21:53.8000000mydomain\waynewest16396
ICaseLoad<StudentID=13><SeekerUserID=42>SeekerUserIDNULL422015-09-29 11:21:53.8000000mydomain\waynewest16395
ICaseLoad<StudentID=12><SeekerUserID=42>StudentIDNULL122015-09-29 11:21:51.4730000mydomain\waynewest16394
ICaseLoad<StudentID=12><SeekerUserID=42>SeekerUserIDNULL422015-09-29 11:21:51.4730000mydomain\waynewest16393
ICaseLoad<StudentID=11><SeekerUserID=42>StudentIDNULL112015-09-29 11:21:48.7470000mydomain\waynewest16392
ICaseLoad<StudentID=11><SeekerUserID=42>SeekerUserIDNULL422015-09-29 11:21:48.7470000mydomain\waynewest16391
I have another table and wrapper system similar to Kristen's for monitoring the execution of SPs. I only have four SPs currently, two that can be run by certain types of users and two that are run by the system on 12/31 and probably 09/01, so that's not as important.
I do like these as we'll be able to see changes over time. I want to have a question mark icon next to certain fields so that the history of that field can be viewed, but that's a feature for much further down the road. The elements are there, I just have to get the code working in the front end app.
-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]
September 29, 2015 at 8:05 pm
Wayne West (9/29/2015)
I use a trigger that I modified from Pop Rivett's column (https://www.simple-talk.com/content/article.aspx?article=301#forum from 2006. Put the trigger on any table to be audited and you get one record per field changed. I'm not sure how it will work under real load, hopefully it'll be OK.
First data-change table auditing I did was like that (at client's request). I mechanically generated the triggers (rather than using dynamic SQL) - this was back in 2001 🙂
The triggers were massive, the audit data generated was massive (the cost overhead for each field changed to store the date, UserID, Table name, Column name as well as the Before/After data), and the data was extremely hard to report on. Can't remember if client ever said that performance was an issue, nearly all their data changes, at that time, would have been single-row due to data entry staff updating records.
If very field columns, in a row, change I can see it could be slim, but adding a new row adds lots of rows to the audit table, of course.
September 30, 2015 at 7:09 am
lshanahan (9/25/2015)
j-1064772 (9/25/2015)
Logging might get complicated during the execution a multi-statement in a stored procedure when the various steps are within a transaction.Has anyone considered the possibility of writing to a file log with the basic "columns" like date and time, stored proc name, parameters, error/status code. The logging will take place regardless of the use of a transaction.
I make extensive use of TRY...CATCH constructs wrapped around explicit transactions and call an error logging sp in the catch block.
That's how we do it too.
September 30, 2015 at 12:58 pm
My solution deals with ETL process logging and SSIS. Because I'm an ETL person.
I created an audit table that included the package name, the instance ID, the start events (I love me some containers), the error criticality, the row counts, the end events, and on error event handling that included the entire error message.
I used a stored procedure to create the audit records but now that SSIS has evolved a bit I would look into using more of the event handlers. The downside to this is letting other developers know that the event handlers exist.
What we got out of this was run durations for various portions of the package, the run duration of the packages, and the amount of data being moved around. If we were expecting 3,000 records and got 300,000 we knew something went wrong. Ditto with getting 0 records.
The intention was to integrate a data driven SharePoint distribution based on the creation of a critical event. We never got that far and made the audit log a regular part of checklists.
It worked great and gave us a lot of insight into the functioning of our processes and when something did a faceplant without generating an error.
October 14, 2015 at 7:08 am
One design consideration for an application event logging table is the acceptable latency between when the event occurs and when the record is available for querying. If we're talking about routinely accumulating more than one insert per second, then I would suggest having the application batch the inserts into a text file and then bulk insert into the table when either threshold of (1000, 10000, ?) records or (10, 60, ?) seconds have elapsed.
Inserting directly into a table each time an event occurs (like a when website user clicks a link or each time a car passes through a toll station), that's not scalable on a relational database.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
October 15, 2015 at 4:00 am
Eric M Russell (10/14/2015)
One design consideration for an application event logging table is the acceptable latency between when the event occurs and when the record is available for querying. If we're talking about routinely accumulating more than one insert per second, then I would suggest having the application batch the inserts into a text file and then bulk insert into the table when either threshold of (1000, 10000, ?) records or (10, 60, ?) seconds have elapsed.Inserting directly into a table each time an event occurs (like a when website user clicks a link or each time a car passes through a toll station), that's not scalable on a relational database.
I very much agree with this!
October 15, 2015 at 4:05 am
How does the APP write to a text file, for later bulk-import, in a multi-user friendly way? Isn't it going to have to open the file, lock it, read the last page, append the new record, write the last page, close the file?
Or is there some smarter way?
Or is the suggestion that each user/session has a separate file, buffered in memory, which is periodically flushed & closed (and new file opened when necessary) to allow bulk import? (That would cause me some worries about being able to get at the errors in the event of a crash, as it might be that that is the very time when I need the log data).
Maybe this data could be "queued" to some other process? For example I haven't (previously!) considered Service Broker for this type of task - perhaps I should?
October 15, 2015 at 4:21 am
Kristen-173977 (10/15/2015)
How does the APP write to a text file, for later bulk-import, in a multi-user friendly way? Isn't it going to have to open the file, lock it, read the last page, append the new record, write the last page, close the file?Or is there some smarter way?
Or is the suggestion that each user/session has a separate file, buffered in memory, which is periodically flushed & closed (and new file opened when necessary) to allow bulk import? (That would cause me some worries about being able to get at the errors in the event of a crash, as it might be that that is the very time when I need the log data).
Maybe this data could be "queued" to some other process? For example I haven't (previously!) considered Service Broker for this type of task - perhaps I should?
As often, the most suited method depends on the situation. It might be implemented as a mail spool folder with one file per "message". The App writes a new file using a filename that contains a timestamp and UUD. Then another asynchronous process pulls those files in, batches them and takes care of the rest. Then even if you app crashes, nothing is lost and result show up in the database with minimal delay!
Viewing 15 posts - 16 through 30 (of 34 total)
You must be logged in to reply to this topic. Login to reply