October 19, 2012 at 7:56 am
I'm researching to create a C# CLR trigger that implements the auditing for any table in a database. One generic piece of code would be able to handle the auditng for any table in the db. The experiments look good, except for one thing: I can not get for which table the trigger was fired.
Here's some experimental c# code I've got so far.
using System;
using System.Data;
using System.Data.SqlClient;
using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
namespace Jfh.Sql.Auditing
{
public partial class AuditTriggers
{
static AuditTriggers()
{
SqlPipe pipe = SqlContext.Pipe;
SqlTriggerContext context = SqlContext.TriggerContext;
pipe.Send(String.Format("static constructor called: column count = {0}, Trigger action = {1}.", context.ColumnCount, context.TriggerAction));
}
public AuditTriggers()
{
SqlPipe pipe = SqlContext.Pipe;
SqlTriggerContext context = SqlContext.TriggerContext;
pipe.Send(String.Format("public constructor called: column count = {0}, Trigger action = {1}.", context.ColumnCount, context.TriggerAction) );
}
[Microsoft.SqlServer.Server.SqlTrigger (Name="AuditTrigger", Target="dbo.Table1", Event="FOR INSERT, UPDATE, DELETE")]
public static void AuditTrigger()
{
SqlPipe pipe = SqlContext.Pipe;
SqlTriggerContext context = SqlContext.TriggerContext;
pipe.Send(String.Format("Trigger fired: column count = {0}, Trigger action = {1}.", context.ColumnCount, context.TriggerAction));
using (SqlConnection conn = new SqlConnection(@"context connection=true"))
{
conn.Open();
SqlCommand cmd = conn.CreateCommand();
cmd.CommandType = CommandType.Text;
cmd.CommandText = @"select @@procID, OBJECT_SCHEMA_NAME(@@PROCID) + '.' + OBJECT_NAME(@@PROCID);";
SqlDataReader reader = cmd.ExecuteReader(CommandBehavior.SingleResult | CommandBehavior.SingleRow | CommandBehavior.SequentialAccess);
if (reader.Read())
{
SqlInt32 object_id = reader.GetSqlInt32(0);
SqlString object_name = reader.GetSqlString(1);
pipe.Send(String.Format("{0}, {1}", object_id, object_name));
}
else
{
pipe.Send("read found no data");
}
}
}
}
}
I deployed the resulting assembly on my server and then execute the following T-SQL to create 2 test tables plus a trigger on each table:
create table dbo.Table1 (
ID int identity(1,1) not null,
name nvarchar(256) not null,
constraint PK_Table1 primary key clustered (ID)
);
go
CREATE TRIGGER [dbo].[AuditTrigger1]
ON [dbo].[Table1]
FOR INSERT, DELETE, UPDATE
AS EXTERNAL NAME [Jfh.Sql.Auditing].[Jfh.Sql.Auditing.AuditTriggers].[AuditTrigger]
go
create table dbo.Table2 (
ID int identity(1,1) not null,
name nvarchar(256) not null,
constraint PK_Table2 primary key clustered (ID)
);
go
CREATE TRIGGER [dbo].[AuditTrigger2]
ON [dbo].[Table2]
FOR INSERT, DELETE, UPDATE
AS EXTERNAL NAME [Jfh.Sql.Auditing].[Jfh.Sql.Auditing.AuditTriggers].[AuditTrigger]
go
Now when I execute this test script:
insert dbo.Table1( name)
values( 'test1'),
('test2')
update dbo.Table1
set
name = replace(name, 'test', reverse('test'))
delete dbo.Table1
insert dbo.Table2( name)
values( 'test1'),
('test2')
update dbo.Table2
set
name = replace(name, 'test', reverse('test'))
delete dbo.Table2
The output is:
Trigger fired: column count = 2, Trigger action = Insert.
991838031, Null
(2 row(s) affected)
Trigger fired: column count = 2, Trigger action = Update.
991838031, Null
(2 row(s) affected)
Trigger fired: column count = 2, Trigger action = Delete.
991838031, Null
(2 row(s) affected)
Trigger fired: column count = 2, Trigger action = Insert.
991838031, Null
(2 row(s) affected)
Trigger fired: column count = 2, Trigger action = Update.
991838031, Null
(2 row(s) affected)
Trigger fired: column count = 2, Trigger action = Delete.
991838031, Null
(2 row(s) affected)
I need a way for the CLR code to know whether it was called for Table1, for Table2 or any other table in the database.
The test code above tries to use @@procID for the purpose, but -as documented here- @@procid does not return an object_id when called from within an in-process CLR routine. It does return some value but it is the same value when called for either table, and it is not a value that is to be found in sys.objects.
Does anyone have more suggestions for a way to determine what table the trigger fired for?
October 19, 2012 at 4:21 pm
Gosh... I don't know how your code will eventually turn out but I just got done replacing all of the generic CLR Audit Triggers in my system with hard-coded T-SQL that was created by a stored proc because the CLR triggers where so bloody slow. If your triggers are going to save to an "EAV" style audit table where only the columns that changed are saved, then your trigger will have to load both the INSERTED and DELETED table into memory arrays (whether one row at a time or all the rows) and that just takes too long. Then you also have to write back to the Audit table.
On my system, such a generic CLR trigger took 4 minutes to log the updates on just 5 out of 137 columns (yeah, I know... I didn't design these tables) for 10,000 rows.
The new automatically generated hard-coded triggers are almost instantaneous.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 21, 2012 at 2:20 am
We currently do have generated T-SQL auditing triggers. Each trigger gets generated by a stored procedure. These triggers generate an xml file from the inserted & deleted tables and send this into a service broker service to be processed outside the end-user transaction. The auditing mechanism is used o.a. for logging purposes and at the same time for signaling changes that need to be interfaced to external systems. It's been like this in production for several years on our inhouse oltp system with +1000 users 24/7 and performance and reliability is overall great.
However, the more rows are affected by a single audited statement, the larger the xml documents become. When less than -say- 100 rows are affected, the trigger is fast enough in generating the xml's (<80ms range, but most of the time much faster) . But when we go over 1000 rows (or so, depending on several factors) the auditing triggers will slow down the end user transaction significantly. We've found that this is mostly due to generating the one big xml document: we've seen situations where it took up to 15 minutes to generate one. And even some where we had to cancel the operation and break the end user action into smaller pieces to even get it go through. Generating several small documents with in total the same content and processing those was often done in seconds.
My experiments with a CLR routine are primarily aimed at making the auditing trigger automatically split the output into smaller pieces. The CLR routine can make use of streaming (for both the input data and the xml output) to prevent having to read the entire result set into memory while processing it, this should help reduce memory pressure. Also it can pre-compile and cache the table definitions for as long as the CLR is in memory, so I think I can do a lot to prevent performance issues from the CLR. I've in the past already tried splitting the xml in T-SQL but that only made things dramatically slower for even the smaller actions. Any way, I just want to try this, so I can actually compare the both methods.
The CLR also has an additional advantage: we also want a SSB service that controls which tables are audited, i.e. which tables need triggers. It's a lot easier to issue a single create trigger command pointing to the assembly than having to re-create the entire t-sql trigger code whenever a change occurs. In the CLR we can retrieve the table definitions just-in-time when the auditing trigger gets called and cache the compiled definitions for future calls for the same table. By also registering the assembly for specific DDL triggers we can simply have the affected cached table definitions flushed when a DDL change occurs on the audited table, and the table definition will be re-retrieved upon the next auditing call for that table. Because of the asynchronous nature of SSB services we would have to setup a separate service that monitors the DDL changes and block all operations while we are regenerating the T-SQL audit triggers when we do this using T-SQL triggers.
Next to the number of rows one other factor influences the size of the generated xml and thus the performance: the number of columns in a table. A secondary goal of the CLR experiment is therefor to make the auditing only include the columns in the xml that are actually needed. For our purposes the needed columns can change over time: every change would require the T-SQL triggers to be regenerated and this needs to be done at the time the change takes place (again due to the asynchronous usage pattern). Because from the CLR routine we're much "closer" to the table definition I expect the CLR method to impact the end-user experience a lot less than the current T-SQL method. But I need to be able to do some experiments before I can really say something about the results.
And before anyone else mentions it: this code has to support SQL server 2005 and above, so that alone rules out the use of MS CDC. Plus, CDC wasn't as flexible in supporting the secondary goals anyway.
I'll still do some performance comparisons, but without knowing for which table the auditing trigger was fired, I know I can't complete the project. So all suggestions are welcome.
And Jeff: the project that you helped me earlier with is currently on hold. If I do get to continue I'll still let you know the outcome.
October 21, 2012 at 10:01 am
Gosh, thanks for the extensive writeup. Unfortunately, the only way I've seen to get the table name through a CLR trigger is a nasty hack that uses a tablenameIDnaming convention on the primary key. That, of course, isn't very reliable because someone could change the ame of the PK at anytime and, unless you invoke a DDL trigger of somesort to enforce such a namig convention on new tables, there's also no guarantee that new tables will follow the correct convention.
Equally as unfortunate, your writeup still makes me think that you're in for even more headaches and performance issues especially since you're considering auditing only the columns that actually changed. If you want to actually compare values between the INSERTED and DELETED tables, then a CLR is still going to be slow on larger data inserts/updates because your CLR will have to load the INSERTED and DELETED tables into memory first. Even if you only use the equivalent of "IF UPDATED(columnname) in your CLR (whatever that equivalent may be... I'm not a C# programmer), you still might not be OK because you have to read the trigger tables into memory before you can interrogate them in a CLR. Of course, that can all be avoided using the hard-coded triggers we spoke of earlier.
You spoke of a sproc on each table (if I read that correctly) to initially generate and update the hard-coded triggers. I have been able to create a generic proc that will do that for you just by passing it the name of the table you want to create the trigger for. It uses the same method that you're talking about moving to... auditing only the changes and it does so in a "common" audit table. I've not done it myself because I don't audit but a handful of 8 critical tables but I believe you could write a DDL trigger that would detect any schema changes to the tables and auto-magically rebuild the audit triggers on whatever table changed.
Actually, so of that's not quite right. I audit the changes and send them to a common audit "Staging" table that only has one index to prevent any page splits (all new audit rows go to the logical end of the table). That index, of course, is based on an IDENTITY column having nothing to do with the data on each row.
I also have a scheduled job that runs once a minute to transfer the audit data from the staging table to the final audit table. Since each row has the table and column name recorded in it as well as the type of action (Insert/Update/Delete) that caused the row to be audited, I would imagine that you could take such a thing further by using the info in the staging table to create your XML rather than trying to do it "in process" during a CLR trigger.
Of course, that process also clears rows from the staging table once they've been moved. You could do the same thing once the rows have been converted to XML.
The bottom line is that I think you might actually make performance matters worse by trying to do any of this auditing with a CLR not because CLR is bad but because you end up needing to load the entirety of both trigger tables in order to do anything real with them. A hard-coded trigger using "IF UPDATE(columname)" will run much faster especially for "record changes only" types of triggers especially on wide tables.
One other thing that you need to worry about whether you use a CLR or a hardcoded trigger. BLOB and certain other datatypes aren't auditable through a trigger. You can tell what action was taken but you can't actually directly audit the changes that occurred. You would need to audit such columns by making a trip to the real table and picking up the new value to store. There's no way that I'm aware of to audit the "before" condition of the data.
Last but not least, you spoke of using Service Broker to move some of this data. Wouldn't it be easier to setup some form of replication of the audit data and change the data to XML there? I could certainly be incorrect but it seems like you might be using Service Broker as a "poor man's" replictation.
And thanks for the feedback on the "other" project.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 21, 2012 at 10:08 am
Ahhhh... getting back to the original subject of how to get the table name... I found an interesting bit of code that Remi suggested at the following URL. I've not tried it but I believe it'll work and it's certainly more reliable than the primary key name hack. Here's the URL.
http://www.sqlservercentral.com/Forums/FindPost298963.aspx
--Jeff Moden
Change is inevitable... Change for the better is not.
October 21, 2012 at 1:03 pm
Jeff Moden (10/21/2012)
Gosh, thanks for the extensive writeup. Unfortunately, the only way I've seen to get the table name through a CLR trigger is a nasty hack that uses a tablenameIDnaming convention on the primary key. That, of course, isn't very reliable because someone could change the ame of the PK at anytime and, unless you invoke a DDL trigger of somesort to enforce such a namig convention on new tables, there's also no guarantee that new tables will follow the correct convention.
I dislike the naming convention approach just as much as you do. Besides, the databases that need to be audited are existing db's, so I will hardly be able to enforce such a convention. So this is a no-go for us.
Equally as unfortunate, your writeup still makes me think that you're in for even more headaches and performance issues especially since you're considering auditing only the columns that actually changed. If you want to actually compare values between the INSERTED and DELETED tables, then a CLR is still going to be slow on larger data inserts/updates because your CLR will have to load the INSERTED and DELETED tables into memory first. Even if you only use the equivalent of "IF UPDATED(columnname) in your CLR (whatever that equivalent may be... I'm not a C# programmer), you still might not be OK because you have to read the trigger tables into memory before you can interrogate them in a CLR. Of course, that can all be avoided using the hard-coded triggers we spoke of earlier.
I think you're missing a bit of information here: we can pass a modifier CommandBehavior.SequentialAccess to Command.ExecuteReader(), which makes that the reader returned will only keep the column in memory for as long as this is the current item being worked on. So we can read the data without having to load all results into memory. Instead the data is "streamed" through the CLR code: at any time only the value we're currently processing is in memory. There are a few restrictions on what you can and can't do with the data taking this approach, but using the SequentialAccess modifier we advance a "window" over the data that we're processing instead of reading it all then processing it all in one go. Also, the SequentialAccess mode ensures you CAN read large objects from the same stream. I do have to admit that I still need to verify whether both points also hold for a SqlCommand querying the Inserted and Deleted tables from a SqlTriggerContext, but I have little reason to believe they won't.
You spoke of a sproc on each table (if I read that correctly) to initially generate and update the hard-coded triggers. I have been able to create a generic proc that will do that for you just by passing it the name of the table you want to create the trigger for. It uses the same method that you're talking about moving to... auditing only the changes and it does so in a "common" audit table. I've not done it myself because I don't audit but a handful of 8 critical tables but I believe you could write a DDL trigger that would detect any schema changes to the tables and auto-magically rebuild the audit triggers on whatever table changed.
Sorry, I didn't explain that properly. We also have one generic procedure that generates the trigger(s) for any table. We're already successfully auditing over 50 tables with that, so we know how to do this. In fact we've even got multiple subscribers for the same audited data; each of the subscribers may be interested in another subset of tables or columns from the same tables.
Actually, so of that's not quite right. I audit the changes and send them to a common audit "Staging" table that only has one index to prevent any page splits (all new audit rows go to the logical end of the table). That index, of course, is based on an IDENTITY column having nothing to do with the data on each row.
We initially used this approach too, but we found that having a single table that all triggers write to causes a serious hot-spot, resulting a.o. in delays and even deadlocks between processes that without the auditing in place would never have had conflicts at all. It is even more complicated since our databases are replicated using merge replication. Merge replication -as you know- also uses triggers on the replicated tables which write into a shared set of tables. This combined with the fact that the order in which triggers are executed can't be influenced, makes that we can't predict nor influence whether our trigger or the replication's would fire first, resulting in unpredicatble locking patterns. We've found we could get around this by sending service broker messages instead of writing to a table.
I also have a scheduled job that runs once a minute to transfer the audit data from the staging table to the final audit table. Since each row has the table and column name recorded in it as well as the type of action (Insert/Update/Delete) that caused the row to be audited, I would imagine that you could take such a thing further by using the info in the staging table to create your XML rather than trying to do it "in process" during a CLR trigger.
Of course, that process also clears rows from the staging table once they've been moved. You could do the same thing once the rows have been converted to XML.
Your scheduled job introduces an additional delay of at least 30 seconds on average for each change to arrive at the destination; we need the changes to arrive practically real-time at the destination, so I would like to avoid batch processes like these as much as possible.
The bottom line is that I think you might actually make performance matters worse by trying to do any of this auditing with a CLR not because CLR is bad but because you end up needing to load the entirety of both trigger tables in order to do anything real with them. A hard-coded trigger using "IF UPDATE(columname)" will run much faster especially for "record changes only" types of triggers especially on wide tables.
Our current T-SQL audit triggers generate the xml document in one single statement; it's a cross apply generating the elements for each row using a "union all select <name>, <value> where <conditions> to decide which columns to include in the xml. One of the conditions is indeed the update(columnname). The SqlTriggerContext object available to the CLR has an equivalent array of boolean values that can server the exact same purpose. As I explained, the SequentialAccess method doesn't require all data to be read at once and thus shouldn't give the memory pressure that you're (rightfully) afraid of.
One other thing that you need to worry about whether you use a CLR or a hardcoded trigger. BLOB and certain other datatypes aren't auditable through a trigger. You can tell what action was taken but you can't actually directly audit the changes that occurred. You would need to audit such columns by making a trip to the real table and picking up the new value to store. There's no way that I'm aware of to audit the "before" condition of the data.
As I explained above, I haven't checked this yet, but the CLR shouldn't have more problems with this than a T-SQL implementation. I do know we've already included nvarchar(max) in our T-SQL triggers and this works as expected. The only thing is that we had to do some pretty bad tricks to ensure all comparisons were performed case and accent sensitive. In c# all string comparisons are case and accent sensitive by default, so I think we may even have an advantage there for the CLR approach.
Last but not least, you spoke of using Service Broker to move some of this data. Wouldn't it be easier to setup some form of replication of the audit data and change the data to XML there? I could certainly be incorrect but it seems like you might be using Service Broker as a "poor man's" replictation.
The data in our databases is already replicated and MS replication doesn't allow the data to be re-replicated (other than choosing transactional replication for the one and merge for the other -which is a setup we also applied already and introduces a whole new world of issues :(- ). On top of that replication is very restrictive in the SQL server versions that you can replicate into/from. Our oltp systems are still at SQL server 2005 and there is little reason to upgrade to the much more expensive licensing scheme of 2008 or 2012. Yet many systems that it communicates with are at other versions: 2008 and even some 2012 systems are under construction (one of them is the previous project). Luckily we eliminated the last of our 6.5 systems little more than a year ago, but we will always have the requirement to link between versions. Service broker does not restrict us like replication does, so it it is perfect for our purpose.
So, I hear your warnings, but I still think it's worth at least the experiment. The only issue I have is that I do not want to have to write/generate and then compile a separate CLR routine for each table. Because that eliminates a lot of the advantages the CLR gives us over the T-SQL approach. On the other hand, without a way to determine the table the CLR audit trigger was fired for, the entire setup doesn't have any chance of succeeding.
October 21, 2012 at 1:12 pm
R.P.Rozema (10/21/2012)
So, I hear your warnings, but I still think it's worth at least the experiment. The only issue I have is that I do not want to have to write/generate and then compile a separate CLR routine for each table.
Stongly agree with you on both points. Like my Dad says, "One experiment is wortha thousand 'expert' opinions." 😀
On the other hand, without a way to determine the table the CLR audit trigger was fired for, the entire setup doesn't have any chance of succeeding.
Did you see my last post above? That might be way but don't know how it would be implemented 'cause I'm not a C# Programmer.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 21, 2012 at 1:18 pm
Jeff Moden (10/21/2012)
Ahhhh... getting back to the original subject of how to get the table name... I found an interesting bit of code that Remi suggested at the following URL. I've not tried it but I believe it'll work and it's certainly more reliable than the primary key name hack. Here's the URL.
Yeah, I had tried that. But this doesn't work when inside a CLR: @@procID doesn't return a valid object_id when called from a CLR trigger. I have not been able yet to find what the number does represent, but it is the same number no matter from which table the CLR audit trigger got fired (reference).
But: your suggestion has given me the idea that maybe I can create a T-SQL trigger that calls a CLR procedure; the T-SQL code CAN get a @@procID that is unique per table. All it needs to do is pass this to the CLR... I'll let you know what happens monday morning!
October 22, 2012 at 3:40 am
Calling a CLR procedure passing in the @@procid from a T-SQL trigger works. But, if I do this, I don't have access to the SqlTriggerContext object inside the CLR code anymore. I think I can do without that better than without knowing which table was affected, so I'm working my way through the various issues one step at a time.
Edit: I was wrong. The SQLCLR procedure not only has no access to the SqlTriggerContext object, also the inserted and deleted tables are not accessible and I can not do without the inserted and deleted pseudo tables. These are apparently only accessible for SQLCLR triggers and not for SQLCLR procedures called from a T-SQL trigger... (Sigh, why MS?, WHY!?)
October 23, 2012 at 2:49 am
An extensive search on the internet has revealed that the functionality I need (getting the parent object that was affected by the action that the trigger fired for) is not available. It has been requested at least as far back as 2006 (by Erland Sommarskog), but it has today still not been made available.
Erland's bug report was rejected by MS because he requested a change in the behavior of @@procID.
The same functionality that Erland tried to achieve can however more easily be implemented by adding a single property in the SqlTriggerContext object that indicates the parent object the trigger fired for. i.e. All we need is a single int property "ParentID". Please help getting the message through to Microsoft by voting UP on my Connect feature request:
October 5, 2016 at 1:40 am
If your problem is still current you can get an inspiration from this article on the Code Project web site: Data Version Control on MS SQL Server (using a generic trigger).
The article shows how to ensure the parent table name to be available in a SQLCLR generic trigger context by a simple way.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply