Many companies which have transactional applications, working with government or for many other reasons they have very strong requirements for the audit of data changes. In many other cases data changes auditing is the internal business requirement. Many companies using audited data and shows it via front end or generate business reports from it.
In present days of increased corporate scrutiny and regulatory requirements, you want to be sure that your integration architecture does not introduce hidden decisions that may affect your compliance. For example, medical trial companies have a strong FDA regulation that required to keep record of basically any data changes in a database. In addition, clients very often would like to make sure that their data getting flagged and can’t be modified silently without an ability to trace any modifications and in case of mistakenly modified the data have an ability to be corrected by setting the old value back.
In such cases I would suggest that you have to have full and detailed history of the record changes which can be shared with the business stakeholders of the system.
There are few ways the companies are trying to setup the audit solutions. One way is to use a third party products such Lumigent Entegra (http://www.lumigent.com/products/entegra_sql.html)
The other companies are trying to create their own home grown solutions.
There are many articles about different types and different ways for the audit architecture. The great series of articles (4 articles) posted in SQLServerCentral.com by Steve Jones http://www.sqlservercentral.com/columnists/sjones/auditingyoursqlserverpart4selectiveauditing.asp
And all the articles are great. But there is something in addition to the auditing theory and simple solution examples is missing in most of them.
Is your organization faced with pending audits resulting in long and tedious meetings to try to achieve compliance? Have you been trying to figure how you are going to develop a code to capture the data needed for the continuous audits in between the remainder of the company projects?
Did you try for example to implement those
solutions in the environment with 100 and more tables, where each customer have
their own similar but slightly different database (for example in medical trial
or accounting firm) and number of new databases 100+ during the year? And you
will figure out quickly that the manual creation of an audit objects is not
really working.
In this article I am not
going to explain various scenarios for auditing (it was done in many other
articles) but would like to show the automated way to setup an audit system and
describe the generator which is automatically creating the full audit solution
regardless of the database structure. Eventually,
it will be done for the one audit solution which was implemented for one of my
clients. And I can say that solution is working without any issues for almost a
year.
When I start working to architect the
solution some of the requirements were already placed by business and
application, and report developers. So, I was staying in boundaries and suppose
to keep the minimal database changes
First, solution is based on the assumption that the audit trail tables should have the same structure as audited tables and will be created for each main table.
The code for this method is usually inserting the old row in the audit table when any data changes are in the main table.
This can be implemented by using a trigger that inserts the data in the audit table. This method also does not require a primary key as it is just saving the before versions of rows updated or inserted.
But to be able to distinguish the row that was modified or inserted each row from main table should have a row identifier or a key.
Further in the article I will use term modified for both updated and inserted rows. It was very sad that developers did the initial database design
without consideration of data auditing (what is surprise!) and only at the last moment DBA/Architect was invited when it was too late to make main
schema changes. Known history, isn’t it? So, as an architect I placed some additional rules which were not really affecting any existing development.
1. Each table should have unique row identifier based on one column row_id
2. Column row_id will be an identity column in tables where there is no identity column and row_id will be a computed column with value equal the identity column value if table already has identity column. It may sound strange but applications required easily and uniquely identify the row if primary key consists of few columns or is not exists at all (should it be with proper design?). Is it surprise you that some tables are without primary keys?
Developers are telling me from time to time that this is a given design and I can’t change it or that there are cases when primary key (even the fake one) is not necessary.
This is a known fact that many developers creating a unique index instead of a primary key arguing that it gets the same effect. But let’s focus on our task instead of keeping good discussion about necessity of primary key.
3. Audit table is keeping the changed row, not an old row. This was done to increase performance of the reports and eliminate the necessity of audit and audited table joint to get the old and current values.
4. Each history table has modid field to show all the rows which are changed in one modification
5. Each audit table will have the same name as the main user table but with prefix ‘a_’
Some other business requirements were posted to complicate an audit solution. But I am omitting them for sake of simplifying the article’s implementation and give you a working idea of an audit generator. Let’s check the solution for one table.
Create table customer ( cust_id int primary key, cust_name varchar(50), row_id int identity(1,1) ) Create table order ( order_id int identity(1,1) primary key, order_desc varchar(50), row_id as order_id)
Table customer created with row_id as identity column because cust_id is not an identity. Table order has row_id as computed column from order_id because order_id is identity column.General modification table can be created next way:
Create modif (modid int identity(1,1), tablenm varchar(50), dml_type char(1), insertdt datetime default getdate() , insertuser varchar(128) default suser_sname(), insertmachine varchar(128) default host_name(), insertprocess varchar(128) default 'App=(' + rtrim(isnull(app_name(), '')) + ') Proc=(' + isnull(object_name(@@procid), '') +')' )
DML_type can be I for insert, U for update, D for delete.
Create table a_customer(cust_id int, cust_name varchar(50), row_id int, modid int, audit_id int identity(1,1) primary key ) Create table a_order ( order_id int , order_desc varchar(50), row_id as order_id, modid int, audit_id int identity(1,1) primary key )
Next step shows the trigger for the table customer. This trigger is generic solution for any table in database. If multiple triggers exists for the same table for the same event (UPDATE, INSERT, and DELETE) then audit trigger should be the last one among them.
if exists (select * from dbo.sysobjects where name = 't_customer_A' and type = 'TR' ) drop trigger dbo.t_customer_A GO SET QUOTED_IDENTIFIER ON SET ANSI_NULLS ON GO CREATE TRIGGER t_customer_A ON customer AFTER INSERT,UPDATE,DELETE AS BEGIN declare @tblname varchar(128) SET NOCOUNT ON set @tblname = 'customer' select * into #td from deleted select * into #ti from inserted exec p_trigger @tblname = @tblname END GO SET QUOTED_IDENTIFIER OFF SET ANSI_NULLS ON GO sp_settriggerorder @triggername= 't_customer_A', @order='LAST', @stmttype = 'INSERT' sp_settriggerorder @triggername= 't_customer_A', @order='LAST', @stmttype = 'UPDATE' sp_settriggerorder @triggername= 't_customer_A', @order='LAST', @stmttype = 'DELETE'
As you can see the only change for another table will be the value of the variable @tblname.
Now we need a stored procedure p_trigger to add record(s) to the audit table. This procedure dynamically creating the insert statement for the audit table and insert rows into the audit table based on the passed table name value. Procedure p_trigger is inserting the row with general information about the modification into the table modif. But before seeing the source code for the procedure p_trigger let’s find out what is required to implement a generic solution for all tables in any database?
- Generate script for each table to add column row_id and add it
- Generate audit table script for each user table in database and create it
- Generate trigger for each table
- Create wrapper to get audit columns and objects for each table in database.
Set of stored procedures created in database is producing the necessary result.
Procedure p_add_row_id produces the statement to add the column row_id if column is not exists in table.
Procedure p_generate_audit_table generates drop and create statement for the audit table
Procedure p_generate_audit_trigger generates statement to drop and create the audit trigger for the table
And finally, procedure p_generate_audit is the wrapper which generating the script for all tables in database.
If you compile the stored procedures in database and run the wrapper p_generate_audit the script will be generated. Here is the output produced by the generator for the database Pubs Pubs_GeneratedOutput.txt
The last portion of the equation is generic stored procedure p_trigger. This procedure is actually inserting the rows to the audit and modification tables.
SET QUOTED_IDENTIFIER ON SET ANSI_NULLS ON GO CREATE PROCEDURE p_trigger @tblname VARCHAR(128) as BEGIN declare @cnti int, @cntd int, @maxid int, @minid int, @cmd varchar(4000), @audittblname varchar(128), @cmdINSERT varchar(1000), @cmdSELECT varchar(1000), @modif_id bigint, @cmdFROM varchar(255), @AUDITINSERT varchar(255), @DUI varchar(10) set @audittblname = 'a_' + substring(@tblname,2,128) declare @tmp table (cname varchar(128), cid int identity(1,1) ) select @cnti = count(*) from #ti select @cntd = count(*) from #td -- check how many rows changed. If 0 then do nothing IF (@cnti = 0 and @cntd = 0) return 0 -- get all table columns insert into @tmp(cname) select COLUMN_NAME from INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @tblname select @maxid = max(cid), @minid = min(cid) from @tmp set @cmdSELECT = ' SELECT ' set @cmdINSERT = ' INSERT INTO ' + @audittblname + ' ( ' while (@minid <= @maxid) begin -- begin while select @cmdINSERT = @cmdINSERT + cname + ',' , @cmdSELECT = @cmdSELECT + 'd.' + cname + ',' FROM @tmp where cid = @minid set @minid = @minid + 1 end -- end while -- always set new rows for the AUDIT IF (@cnti = @cntd ) begin set @DUI = ' ''U'' ' SET @cmdFROM = ' FROM #ti d ' END IF (@cnti < @cntd) begin set @DUI = ' ''D'' ' SET @cmdFROM = ' FROM #td d ' END IF (@cnti > @cntd) BEGIN set @DUI = ' ''I'' ' SET @cmdFROM = ' FROM #ti d ' END -- insert record into table modif insert into modif(tablenm, dml_type) select @tblname, @dui -- get identity select @modif_id = SCOPE_IDENTITY( ) -- add modification column value set @AUDITINSERT = ' modid )' set @cmd = @cmdINSERT + @AUDITINSERT + @cmdSELECT + cast(@modif_id as varchar) + @cmdFROM exec (@cmd) SET NOCOUNT OFF END GO SET QUOTED_IDENTIFIER OFF SET ANSI_NULLS ON GO
CONCLUSION
It is possible to make all the procedures more sophisticated and add features such as index creation for the column modifid and row_id.
For my client I created generator the way that all procedures were created in one central location and one of the additional parameters was parameter for the audit database name. It allows centralize the stored procedures in one database per server. Some other parameters allowed exclude tables or include only specific tables in database by passing comma-delimited string. So, many additional features can be implemented in the audit generator to satisfy many other scenarios.