Audit transformation in SSIS

  • Hi everyone. This is my first time posting on this forum so let me know if you need more information in order to answer my question.

    I'm working on a ETL-project and part of the assignent is to implement auditing. The audit transformation is working flawlessly, however I want to add an audit_id

    that will correspond to the respective dimension tables.

    Like I mentioned, I am able to retrieve data/information for every column in the audit transformation (contained in a table in my database) with an audit_id column that use

    identity (1,1) but I am unable to join them to the dimension tables audit_id column.

    I've been trying to solve it with a Lookup-transformation but it's been unsuccessful.

    I appreciate any feedback or answer.

    Thanks in advance!

    // Daniel

  • Hi Daniel,

    I will need a little bit more details in order to help you.

    - Which SQL Server version are you using?

    - As far as I've understood, you have implemented your own audit process. How does it works? Which auditing information are you logging?

    There is a built-in audit functionality in SQL Server 2008 which gathers the Names and IDs of your packages and tasks, but it is very basic.

    Kind Regards

    Paul Hernández
  • Thank you for your reply Paul.

    I'm using SQL Server 2012 (SP1), Developer Edition.

    I use a OLE DB source to extract some data from 3 different tables from a specific database. I then

    connect this component with the Audit transformation component. In the audit I selected the following information to audit:

    Execution start time, Machine name, User name, Package name and Task name.

    After that I create a data flow path between my audit transformation and an OLE DB destination where I load my Audit_table. When I created the audit_table

    I created a column called Audit_id with identity (1,1).

    This is as far as I get. Like I mentioned in my previous post I am unable to join the Audit_id in my audit_table with my audit_id columns in my various dimension tables.

    I have 6 dimension tables and 2 fact tables which currently have audit_id column with the value NULL.

    I hope this is all the information you need and if not please let me know.

    // Daniel

  • daniel-johnsson (3/7/2013)


    Thank you for your reply Paul.

    I'm using SQL Server 2012 (SP1), Developer Edition.

    I use a OLE DB source to extract some data from 3 different tables from a specific database. I then

    connect this component with the Audit transformation component. In the audit I selected the following information to audit:

    Execution start time, Machine name, User name, Package name and Task name.

    After that I create a data flow path between my audit transformation and an OLE DB destination where I load my Audit_table. When I created the audit_table

    I created a column called Audit_id with identity (1,1).

    This is as far as I get. Like I mentioned in my previous post I am unable to join the Audit_id in my audit_table with my audit_id columns in my various dimension tables.

    I have 6 dimension tables and 2 fact tables which currently have audit_id column with the value NULL.

    I hope this is all the information you need and if not please let me know.

    // Daniel

    I don't think I fully understand the structure of what you are doing.

    Are you saying that AuditId in your dim tables should be a many to one FK to AuditId in your Audit table?

    Is it just a case of retrieving the AuditId for a single insert to your Audit table and then passing that into subsequent data flows?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hi Daniel,

    The problem here is how to look up the Audit_Id in your Audit table using one or more columns from your dimension or fact tables. You have the following information: Execution start time, Machine name, User name, Package name and Task name.

    How you can relate this two table, good question. If you store the creation time of the columns (I always create a CretatedOn and UpdatedOn column for Dimension tables and CreatedOn column for Fact tables) you can make the following:

    Get the Audit_Id where the CreateDT >Execution start time

    But you must do it after an execution and before the next execution in order to obtain one and only one Audit_Id.

    This approach does not work if you have different user, machines or packages running at the same time, it is just an idea and it seems to be not precise.

    Other approach that definitely works is to drop your separated Audit table and add audit columns to each of your dimension tables and fact tables. It is redundant and space consuming but I think it not so bad at all.

    Paul Hernández
  • Hi Phil,

    Yes that is exactly what I want to accomplish. Basically that I have Audit_ID with identity (1,1) that works as a PK in my Audit table and

    my respective dimension tables contain Audit_ID columns that should work as foreign keys to the Audit_ID PK in my Audit table.

    And just what you mentioned, I want to retrieve the Audit_ID from my Audit table and pass it on through the data flow to give that value

    to my various dimension tables.

    Also I should mention that I'm auditing every row that's passing through but the only value that seperates these is the Audit_ID column so I currently end up with

    128 rows which is identical (except for the Audit_ID). Can I use a Row Sampling component to filter it down to just 1 row or will this become a problem?

    Again, let me know if it's not clear what I currently have and what I want to accomplish. I'd be glad to provide the information if I can. Still rather fresh to both

    posting in SSC and working with SSIS.

  • Hi Paul,

    It's true that I could simply drop my audit table and log all my auditing in my various dimension and fact tables but that would not be according

    to my assignment I'm afraid. I would like to make a separate table that contains all my auditing.

    It might work as a last resort if all else fails however.

    I'm not entirely sure how your solution with CreatedOn and UpdatedOn works. Are you saying I should create the columns inside the package?

    And would I then have to create this on every dimension table I'm loading, since I am auditing every dimension table?

    I wouldn't mind doing that, I just want to make sure I understand your solution.

    I appreciate the help!

    // Daniel

  • daniel-johnsson (3/8/2013)


    Hi Phil,

    Yes that is exactly what I want to accomplish. Basically that I have Audit_ID with identity (1,1) that works as a PK in my Audit table and

    my respective dimension tables contain Audit_ID columns that should work as foreign keys to the Audit_ID PK in my Audit table.

    And just what you mentioned, I want to retrieve the Audit_ID from my Audit table and pass it on through the data flow to give that value

    to my various dimension tables.

    Also I should mention that I'm auditing every row that's passing through but the only value that seperates these is the Audit_ID column so I currently end up with

    128 rows which is identical (except for the Audit_ID). Can I use a Row Sampling component to filter it down to just 1 row or will this become a problem?

    Again, let me know if it's not clear what I currently have and what I want to accomplish. I'd be glad to provide the information if I can. Still rather fresh to both

    posting in SSC and working with SSIS.

    I think that you could achieve this as follows:

    1) Create a package-scoped integer variable in SSIS - 'AuditId', perhaps.

    2) Insert an ExecuteSQL task before your data flow - this will insert a row into the audit table and return the new AuditId into your AuditId variable.

    So the SQL would be something like

    insert dbo.Audit (col1, col2)

    select ...

    select scope_identity()

    3) Now you can add a derived column to any subsequent data flow and set its value to whatever is in the variable AuditId.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (3/9/2013)


    daniel-johnsson (3/8/2013)


    Hi Phil,

    Yes that is exactly what I want to accomplish. Basically that I have Audit_ID with identity (1,1) that works as a PK in my Audit table and

    my respective dimension tables contain Audit_ID columns that should work as foreign keys to the Audit_ID PK in my Audit table.

    And just what you mentioned, I want to retrieve the Audit_ID from my Audit table and pass it on through the data flow to give that value

    to my various dimension tables.

    Also I should mention that I'm auditing every row that's passing through but the only value that seperates these is the Audit_ID column so I currently end up with

    128 rows which is identical (except for the Audit_ID). Can I use a Row Sampling component to filter it down to just 1 row or will this become a problem?

    Again, let me know if it's not clear what I currently have and what I want to accomplish. I'd be glad to provide the information if I can. Still rather fresh to both

    posting in SSC and working with SSIS.

    I think that you could achieve this as follows:

    1) Create a package-scoped integer variable in SSIS - 'AuditId', perhaps.

    2) Insert an ExecuteSQL task before your data flow - this will insert a row into the audit table and return the new AuditId into your AuditId variable.

    So the SQL would be something like

    insert dbo.Audit (col1, col2)

    select ...

    select scope_identity()

    3) Now you can add a derived column to any subsequent data flow and set its value to whatever is in the variable AuditId.

    Instead of :

    insert dbo.Audit (col1, col2)

    select ...

    select scope_identity()

    Why not:

    insert dbo.Audir(col1,col2)

    output INSERTED.AuditID -- or what ever column is the Identity column

    select ...

    The output value(s) could also be put into a temporary table or table variable if needed first. It depends on how the INSERT is being done, is it a single row or multi-row entry.

  • Hi Phil and Lynn,

    I have been trying to solve my problem with your instuctions but I'm still not able to work it out I'm afraid. Maybe I am missing something obvious, so I will explain a bit more in detail how far I have got.

    I created an integer variable that I named AuditID and before I do my first load to my first staging table I insert a row in my Audit_table with the following t-sql:

    SET IDENTITY_INSERT dbo.STAGE_AUDIT ON

    INSERT INTO dbo.STAGE_AUDIT ([Audit ID], [Package Name], [Execution Start Time], [Machine name], [User name], [Task name])

    VALUES (1, 'Staging', '2013-03-13', 'DANIEL-SSIS', 'ECUTBILDNING\Administrator', 'Stage Geography')

    SELECT SCOPE_IDENTITY()

    This way it inserts the same information as my Audit Transformation would. Maybe this is not how the SQL statement should look like, but I'm honestly unsure what the statement should look like.

    Nevertheless, I'm successfully loading my variable with 1 in the Result Set section saying result name = 0, Variable name = User::AuditID. Also I'm using result set as single row.

    From this Execute SQL Task my data flow continues into my first load Data Flow task which I can summarize like this:

    A source where I retieve some data and insert it into my staging table.

    Next to it I use the same source, retrieving the same data but before I connect it to my destination I use an Audit Transformation and also a Row Sampling to only have 1 row as the output since they are all identical except for their AuditID. I then connect it to my destination table, which is my Audit_Table.

    Lastly I use a source which is my staging table that I loaded previously. My data path is connected with a Derived Column which looks as following:

    I replace my audit_id column with the expression audit_id == @User::AuditID and then I connect this Derived Column with my destination, which is the same table as my source.

    Sadly when I run this package my AuditID remains NULL in my staging table. Obviously I'm doing something wrong but that is why I'm here asking for help is it not 🙂

    Thanks again for your time and help, and I'm sorry if it's not very clear with the lack of images and the fact that it's all just text. Let me know if it's unclear or wether you need more information.

    // Daniel

  • daniel-johnsson (3/14/2013)


    Hi Phil and Lynn,

    I have been trying to solve my problem with your instuctions but I'm still not able to work it out I'm afraid. Maybe I am missing something obvious, so I will explain a bit more in detail how far I have got.

    I created an integer variable that I named AuditID and before I do my first load to my first staging table I insert a row in my Audit_table with the following t-sql:

    SET IDENTITY_INSERT dbo.STAGE_AUDIT ON

    INSERT INTO dbo.STAGE_AUDIT ([Audit ID], [Package Name], [Execution Start Time], [Machine name], [User name], [Task name])

    VALUES (1, 'Staging', '2013-03-13', 'DANIEL-SSIS', 'ECUTBILDNING\Administrator', 'Stage Geography')

    SELECT SCOPE_IDENTITY()

    This way it inserts the same information as my Audit Transformation would. Maybe this is not how the SQL statement should look like, but I'm honestly unsure what the statement should look like.

    Nevertheless, I'm successfully loading my variable with 1 in the Result Set section saying result name = 0, Variable name = User::AuditID. Also I'm using result set as single row.

    From this Execute SQL Task my data flow continues into my first load Data Flow task which I can summarize like this:

    A source where I retieve some data and insert it into my staging table.

    Next to it I use the same source, retrieving the same data but before I connect it to my destination I use an Audit Transformation and also a Row Sampling to only have 1 row as the output since they are all identical except for their AuditID. I then connect it to my destination table, which is my Audit_Table.

    Lastly I use a source which is my staging table that I loaded previously. My data path is connected with a Derived Column which looks as following:

    I replace my audit_id column with the expression audit_id == @User::AuditID and then I connect this Derived Column with my destination, which is the same table as my source.

    Sadly when I run this package my AuditID remains NULL in my staging table. Obviously I'm doing something wrong but that is why I'm here asking for help is it not 🙂

    Thanks again for your time and help, and I'm sorry if it's not very clear with the lack of images and the fact that it's all just text. Let me know if it's unclear or wether you need more information.

    // Daniel

    First things first, you don't need IDENTITY_INSERT here (otherwise the package will fail the second time you run it). Instead, do this:

    INSERT INTO dbo.STAGE_AUDIT ([Package Name], [Execution Start Time], [Machine name], [User name], [Task name])

    VALUES ('Staging', '2013-03-13', 'DANIEL-SSIS', 'ECUTBILDNING\Administrator', 'Stage Geography')

    SELECT SCOPE_IDENTITY()

    That's the whole point of using scope_identity() - it brings back whatever value SQL Server generated during the insert.

    Now, on to derived columns. Your Expression will be

    @User::AuditID

    (note - no equals signs)

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply