Validating Loaded Data in an SSIS Package

  • Hi Forum,

    I want to prove that all the records I've Extracted from a source into an SSIS Package are still present when I export out of the Package?

    I can add a Row Source but I'd like something a bit more thorough, like a profile of the data before & after to show it's the same data.

    Ideally, something that works on very large datasets, which Transformation should I use OR is there a script that would work better?

    Thanks

     

  • SOX requirement, by any chance?

    What is your data source?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Hi Phil,

    I don't have a specific file in mind, it's more a process of how to check that all records went through SSIS and all came out the other side.

    What options are there to check the results versus the input apart from RowCount?

    Thanks

  • As far as I know, there is no fixed way of doing this.

    I used to have a series of files which were loaded every month and we had to be sure that what was in the files was the same as what ended up being loaded to SQL Server. We also had to convince auditors. It took some significant development, but we came up with the following process.

    1. Using Python, analyse the source file and produce a series of metrics. The metrics include total row count, the sum of every numeric column, the maximum and minimum of every date column. The maximum and minimum of every string column.
    2. Load the data into SQL Server using SSIS.
    3. Using Python once again, compare the figures obtained in (1) against the results obtained by running relevant queries against the SQL Server database.
    4. If there are any mismatches, sound the alarm.

    Why Python? Because we wanted to use a non-MS technology to validate what MS was doing. Seemed like a good idea at the time.

     

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Hi Phil,

    That's exactly what I was looking for!

    I did find the Profiling Node in SSIS which gave me similar stats to what you extracted in Python.

    I also like the idea of using a separate technology like Python.

    Thanks heaps Phil.

  • Oscar_Boots wrote:

    Hi Phil,

    That's exactly what I was looking for!

    I did find the Profiling Node in SSIS which gave me similar stats to what you extracted in Python.

    I also like the idea of using a separate technology like Python.

    Thanks heaps Phil.

    And thanks to you for posting back. I'm glad you found it helpful.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply